Skip to main content

Opening and Closing a file in Python

 

Open a file

Python has a built-in function open() to open a file. Opening a file can be for different purposes like reading and/or writing data. Open function returns a file object which can be used for reading, writing and/or updating data in a file. 

Syntax

fileobject = open(file, mode, buffering, encoding, errors, newline, closefd, opener)

Among all the parameters mentioned in the syntax, only file is the mandatory parameter. Open function accepts file name and returns a corresponding file object. By using this file object data from the can be read or data can be written to the file (based on the mode specified). 

file

Before we see the importance of all the parameters, let's have a look at an example using just the mandatory parameter 'file'. 

E.g.: 

1

test_file = open("test_file.txt")


In the above example, we are trying to open the text file 'test_file.txt' and read the contents of the file. 

Couple of things to note here is, 
  • We have not mentioned where the file is located. 
  • We have not mentioned the mode (i.e., read, write or update) in which the file is opened. 
If we do not mention the file location, by default Python will look for the file in the same folder where the Python program is located. 

Let's consider if the file doesn't exist on the same folder, mentioning the file name with out the location will throw 'FileNotFoundError'. 

To avoid this error, full path of the file can be specified in the open function. 

E.g.:

1

test_file = open("C:/Users/Admin/Desktop/test_file.txt")


Let's now look at the usage of the optional parameter. 

mode

This parameter is used to specify the mode in which the file is to be opened. Default value for this parameter is 'r' (read). So, if we don't mention the mode Python opens the file in read mode by default. In the above example, test_file.txt gets opened in read mode (same as below example). 

E.g.:

1

test_file = open("test_file.txt", mode = 'r')


One thing to remember here is, value for the mode is case sensitive. Mentioning 'R' instead or 'r' would result in value error (ValueError: invalid mode: 'R').

Below are the valid values for the mode parameter. 
  • 'r' - File is open for reading (default). 
  • 'w' - File is open for writing, truncates the file first. Any data that is already present in the file would be lost as soon as it is open with write mode. Creates the file if not already present. 
  • 'x' - File is open for exclusive creation, failing if the file already exists. If the purpose is to create a new file and write data, 'x' is safer option compared to 'w'.
  • 'a' - open for writing, appending to the end of file if it exists. With 'w', if the file is already present, data is cleared first. With 'a', data in the file is not cleared and any data written is added to the end of file. 
These values can be used as is for the purpose specified. There are few other valid values which can be used in combination with the above values. 

  • 'b' - binary mode, used for binary files. This is used in combination with any of the modes mentioned above. 
    • E.g.: mode = 'rb' for reading binary files. 
    • mode = 'wb' for writing binary files. 
  • 't' - text mode (default), In case of binary files, 'b' has to be explicitly mentioned. 't' is the default value and doesn't need to be mentioned if reading/writing text (i.e., String). 
    • E.g.: mode = 'r', mode = 'rt' would both work for the same purpose of reading text data.
  • '+' - open for updating (reading and writing). With the use of mode 'r', we are only able to read the data from a file and not write vice versa with 'w', 'x' and 'a'. Adding '+' would give the flexibility to do so. 
    • E.g.: mode = 'r+' would allow both read and writing data into the file. 
    • mode = 'w+' would allow both write and read data. 

buffering

Buffering is another optional parameter and determines the buffering policy while reading a file. Buffering policy is different for binary and text files. 

Default value for buffering is '-1' and works as below. 
  • For text files - Uses line buffering. 
  • For binary files - Uses buffer size in fixed chunks based on the default buffer size (can be found using io.DEFAULT_BUFFER_SIZE).
Other valid values are, 
  • '0' - To turn off the buffering. '0' is only valid in case of binary files. 
  • '1' - To select line buffering. '1' is only  valid in case of text files. 
  • Any other other integer > 1 indicates the number of bytes of a fixed size chunk buffer. 4

encoding

Encoding refers to the encoding used to encode/decode the file. This is only applicable for text files. 

Default value for encoding parameter is 'None' i.e., no encoding is ussed. 

'codecs' module has a list of supported encoding methods. Default encoding method is platform dependent (can be found using locale.getpreferredencoding()). 

errors

Errors defines how encoding and decoding errors are handled. This can only be used with text files. 

There are many standard error handlers are available (can be found under Error Handlers). Apart from these any error handling name that is generated with codecs.register_error() are also valid.

Some of the valid values are, 
  • 'strict' -  Raise a ValueError exception if there is an encoding error.
  • 'ignore' - To ignores errors. Ignoring errors could result in a data loss.
  • 'replace' - To replace the malformed data with a replacement marker (such as '?').

newline

Newline determines how a end of line is identified (Universal newlines mode). This can only be used with text files. Valid values are None, '', '\n', '\r', '\n\r'. 
  • '\n' - Line Feed (LF), used as a new line character in Unix/Mac. 
  • '\r' - Carriage Return (CR), used as a new line character in Unix.
  • '\r\n' - Carriage Return (CR) Line Feed (LF), used as a new line character in Windows. 
While reading the file if new line is specified as, 
  • None Universal newlines mode is enabled. Lines in the input can end in '\n''\r', or '\r\n', and these are translated into '\n' before being returned to the caller
  • ' ' - Universal newlines mode is enabled, but line endings are not translated before being returned to the caller.
  •  If any other valid values (i.e., '\n', '\r' or '\r\n'), input lines are only terminated by the given string, and the line endings are not translated before being returned. 
While writing the file if new line is specified as, 
  • None - Any '\n' specified will be converted to the default line separator (default can be found is os.linesep) while writing.  
  • ' ' or '\n' - No translation takes place while writing. 
  • For all other values '\n' is converted to the string specified.

 closefd

Closefd determines if the file description is to be closed or not when the file is closed. 
  • Default value for this parameter is True, File descriptor is also closed when a file is closed. 
  • False - Only file is closed and file descriptor is kept open when file is closed. 

opener

Opener parameter can be used to pass the custom opener instead of the system opener (i.e., os.open). 

Default value is None and is same as passing os.open.

Custom opener must return an open file descriptor. 

Close a file

An opened file must be closed once the file operations (i.e., read/write) are completed. 

close() method can be used to close the file. 

Syntax

fileobject.close()

close method needs to be used with the file object created from the file open(). 


If you have any Suggestions or Feedback, Please leave a comment below or use Contact Form.

Comments

Popular posts from this blog

All about READ in RPGLE & Why we use it with SETLL/SETGT?

READ READ is one of the most used Opcodes in RPGLE. As the name suggests main purpose of this Opcode is to read a record from Database file. What are the different READ Opcodes? To list, Below are the five Opcodes.  READ - Read a Record READC - Read Next Changed Record READE - Read Equal Key Record READP - Read Prior Record READPE - Read Prior Equal Record We will see more about each of these later in this article. Before that, We will see a bit about SETLL/SETGT .  SETLL (Set Lower Limit) SETLL accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Corresponding Record (or Next Record if exact match isn't found).  SETGT (Set Greater Than) SETGT accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Next Record (Greater Than the Key value). Syntax: SETLL SEARCH-ARGUMENTS/KEYFIELDS FILENAME SETGT  SEARCH-ARGUMENTS/KEYFIELDS FILENAME One of the below can be passed as Search Arguments. Key Fiel

What we need to know about CHAIN (RPGLE) & How is it different from READ?

CHAIN READ & CHAIN, These are one of the most used (& useful) Opcodes by any RPG developer. These Opcodes are used to read a record from file. So, What's the difference between CHAIN & READ?   CHAIN operation retrieves a record based on the Key specified. It's more like Retrieving Random record from a Database file based on the Key fields.  READ operation reads the record currently pointed to from a Database file. There are multiple Opcodes that start with READ and all are used to read a record but with slight difference. We will see more about different Opcodes and How they are different from each other (and CHAIN) in another article. Few differences to note.  CHAIN requires Key fields to read a record where as READ would read the record currently pointed to (SETLL or SETGT are used to point a Record).  If there are multiple records with the same Key data, CHAIN would return the same record every time. READE can be used to read all the records with the specified Ke

Extract a portion of a Date/Time/Timestamp in RPGLE - IBM i

%SUBDT Extracting Year, Month, Day, Hour, Minutes, Seconds or Milli seconds of a given Date/Time/Timestamp is required most of the times.  This can be extracted easily by using %SUBDT. BIF name looks more similar to %SUBST which is used to extract a portion of string by passing from and two positions of the original string. Instead, We would need to pass a value (i.e., Date, Time or Timestamp ) and Unit (i.e., *YEARS, *MONTHS, *DAYS, *HOURS, *MINUTES, *SECONDS or *MSECONDS) to %SUBDT.  Valid unit should be passed for the type of the value passed. Below are the valid values for each type. Date - *DAYS, *MONTHS, *YEARS Time - *HOURS, *MINUTES, *SECONDS Timestamp - *DAYS, *MONTHS, *YEARS, *HOURS, *MINUTES, *SECONDS, *MSECONDS Syntax: %SUBDT(value : unit { : digits { : decpos} }) Value and Unit are the mandatory arguments.  Digits and Decimal positions are optional and can only be used with *SECONDS for Timestamp. We can either pass the full form for the unit or use the short form. Below i