Skip to main content

SHA-256 Hash Function: What is it and How to Implement?

SHA-256 (Secure Hash Algorithm - 256)

SHA-256 (Secure Hash Algorithm - 256) Hash function is one of the secure hash functions and widely used across various applications. SHA-256 is part of SHA-2 family of hash functions and developed by National Security Agency (NSA) in 2001. 256 stands for the length of the hash value generated in bits (i.e., 64 character hexadecimal value). 

SHA-256 hash function is considered to be one of the secure hash function and shows significant improvement over the previous hash functions (MD5 and SHA-1). Below are couple of the significant improvements in SHA-256. 
  • Increased Security - Properties of SHA-256 are considerably stronger and is resistant to collision attacks, pre-image attacks and other vulnerabilities observed in the previous versions. 
  • Robust Hash Function - It's hash value is unpredictable and provides a high degree of resistance to collision attacks, making it suitable for use in various applications. 
Let's dive into the Step-by-Step process of generating 64 character hash value using SHA-256 hash function. 

Step - by - Step Process

Steps involved in generating hash using SHA-256 can be broken down into the below steps. 
  • Message preprocessing 
  • Initializing State variables 
  • Process Message Blocks
  • Concatenating the final Hash Value
Let's dive into each of the step in detail.

Message Preprocessing

The first step in message preprocessing would be to ensure the total length of the input message (with padding added) to be in multiple of 512 bits. SHA-256 hash function works on a blocks of size 512 bits each (like most other). 

Message preprocessing would typically contain three steps. 
  • Converting the input message into binary format (if it is not already binary). 
  • Padding the binary input message to ensure the length of the message is congruent to 448 modulo 512. 
    • Similar to the padding in MD5 and SHA-1, '1' bit is added to the end of the message and this '1' bit acts as a delimiter. 
    • Add the '0' bits to the binary input message until the length of the input message is congruent to 448 modulo 512. 
  • Add 64 bits, which indicate the actual length of the input message before any padding.
Padding is essential to ensure the total length of the input message (including padding 1 bit, 0 bit and 64 bits for length) is multiple of 512 bits. I'm not deep diving into how the padding works and calculating the number of bits to added, as this has been detailed with examples in my previous post on MD5. This can be found here

Below is the formula for calculating the number of bits to be padded for the quick reference. 

Padding Length = (448 - (Message Length + 1)%512) % 512

Let's find out the importance of 448 here, Whereas SHA-256 actually processes blocks of 512 bits size.

64 bits are to be mandatorily padded at the end of input message to indicate the length of actual input message. '512 - 64 = 448', Which would be the required length of the message after any padding ('1' bit, '0' bits). So, any input message which is congruent to 448 modulo 512 would become a multiple of 512 bits after adding it's length at the end.

Let's break down the above formula. 
  • Message Length: Length of the actual input message (before any padding). 
  • (Message Length + 1): '+1' here is to consider the mandatory delimiter '1' bit. 
  • Subtracting from 448: To get the actual number '0' bits to be padded.

Initializing State Variables

SHA-256 Algorithm uses eight state variables, each variable is of size 32 bits (MD5 algorithm uses 4 state variables and SHA-1 uses 5 state variables). The number of state variables depends upon the actual length of the hash (SHA-256 produces a hash value of 256 bits, with each state variable is of 32 bits, with total of 256 bits (8 * 32) with all eight state variables). 

These variables are initialized with pre-determined constant values (mentioned below) and would be modified during the hash generation process. These state variables are usually referred as H0, H1, H2, H3, H4, H5, H6 and H7. 

H0 = 0x6A09E667
H1 = 0xBB67AE85
H2 = 0x3C6EF372
H3 = 0xA54FF53A
H4 = 0x510E527F
H5 = 0x9B05688C
H6 = 0x1F83D9AB
H7 = 0x5BE0CD19

Initial constant values for these state variables are baselined from the (first 32 bits from) fractional parts of the square roots of the first eight prime numbers (2, 3, 5, 7, 11, 13, 17 and 19). These state variables are modified during the hash generation process making the internal state of these variables is random and makes it resistant to pre-image and collision attacks. 

SHA-256 algorithm does various calculations on each message block during the hash generation process and at each step, these Stage variables are used to store the intermediate values.  

After processing each message block, intermediate values of the Stage variables would be used as an input for the processing of next block. 

Process Message Blocks

Processing the message blocks is the primary part of the whole hash generation process. In this step, each message block (of 512 bits) would be fed as an input to the SHA-256 compression function along with the 8 state variables. 

Let's have a look at what the compression function does. 
  • The first the compression function does is to split the input message block (of 512 bits) into 16 words with each word consisting of 32 bits. These words are labelled as W[0], W[1], W[2] . . . [W15]. 
  • Compression function uses an expansion algorithm to expand these 16 words into an array of 64 words (labelled as W[0], W[1], W[2] . . . W[63]). A combination of bitwise logical operations and modular additions are applied on the initial 16 words to expand to 64 words. 
  • Compression function goes through 64 rounds with each round performing a unique combination of logical, bitwise and modular operations. 
  • In each round, eight temporary variables (a, b, c, d, e, f, g and h) are created to hold the current state of the eight state variables, these temporary variables are updated through out the process of AND, OR and XOR logical functions, NOT bitwise function and modular addition functions. These operations are designed to introduce diffusion and confusion and to ensure any small change to the input would result in generating completely different hash. 
  • At the end of each round, data from the temporary variables would be moved into state variables, which would be as input variables along with next message block processing. 
By the end of the processing of last message block, Eight state variables would contain the intermediate hash values which has gone through the multiple round of processing. 

Final Hash Value

To get the final hash value of 256 bits, all the intermediate hash values (each of 32 bits) stored in eight state variables need to be concatenated. 

Resulting hash value would contain 256 bits, which is usually represented as 64 character hexadecimal string. This hash value is unique to the input message, any simple change to the input message would result in generating completely different Hash value. 

Implementing SHA-256 Hash

How to generate SHA-1 hash value for any given string? Let's look at an example in Python. 

For this example we will use the module 'HashLib'. There is a function 'sha1' in HashLib which accepts the encoded input string and generates Hash value.












import hashlib

# Accept the input string for generating Hash

input_string = input("Enter string: ")

# Input string needs to be encoded before generating Hash

encoded_input_string = input_string.encode()

# Encoded string would be passed to 'sha256' function from the HashLib module.

hash_value = hashlib.sha256(encoded_input_string)


In the above example, 
  • Line - 6: Input string needs to encoded. This can be done using the string method encode(). 
  • Line - 8: Encoded input string to be passed to the function sha256(), this returns the HASH object. 
  • Line - 10: Method 'hexdigest()' can be used on the HASH object to get the Hexadecimal Hash value. 
Below is the sample result. 

Enter string: This is a String


Any minor change to the input string would result in generating a completely different Hash value. Below is the hash value with the same string by converting 'S' to lower case. 

Enter string: This is a string


Use Cases

SHA-256 is considered to be one of the secure hash function and is used in various domains. Below listed are few of the many use cases. 
  • Data Integrity Verification
  • Digital Signatures
  • Password Storage
  • Blockchain Technology
  • Secure File Hashing
  • Secure Communication

I hope this post has provided a good insight on what is a SHA-256 hash function and how we can generate Hash using SHA-256 algorithm in Python.

If you have any Suggestions or Feedback, Please leave a comment below or use Contact Form.


Popular posts from this blog

All about READ in RPGLE & Why we use it with SETLL/SETGT?

READ READ is one of the most used Opcodes in RPGLE. As the name suggests main purpose of this Opcode is to read a record from Database file. What are the different READ Opcodes? To list, Below are the five Opcodes.  READ - Read a Record READC - Read Next Changed Record READE - Read Equal Key Record READP - Read Prior Record READPE - Read Prior Equal Record We will see more about each of these later in this article. Before that, We will see a bit about SETLL/SETGT .  SETLL (Set Lower Limit) SETLL accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Corresponding Record (or Next Record if exact match isn't found).  SETGT (Set Greater Than) SETGT accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Next Record (Greater Than the Key value). Syntax: SETLL SEARCH-ARGUMENTS/KEYFIELDS FILENAME SETGT  SEARCH-ARGUMENTS/KEYFIELDS FILENAME One of the below can be passed as Search Arguments. Key Fiel

What we need to know about CHAIN (RPGLE) & How is it different from READ?

CHAIN READ & CHAIN, These are one of the most used (& useful) Opcodes by any RPG developer. These Opcodes are used to read a record from file. So, What's the difference between CHAIN & READ?   CHAIN operation retrieves a record based on the Key specified. It's more like Retrieving Random record from a Database file based on the Key fields.  READ operation reads the record currently pointed to from a Database file. There are multiple Opcodes that start with READ and all are used to read a record but with slight difference. We will see more about different Opcodes and How they are different from each other (and CHAIN) in another article. Few differences to note.  CHAIN requires Key fields to read a record where as READ would read the record currently pointed to (SETLL or SETGT are used to point a Record).  If there are multiple records with the same Key data, CHAIN would return the same record every time. READE can be used to read all the records with the specified Ke

Extract a portion of a Date/Time/Timestamp in RPGLE - IBM i

%SUBDT Extracting Year, Month, Day, Hour, Minutes, Seconds or Milli seconds of a given Date/Time/Timestamp is required most of the times.  This can be extracted easily by using %SUBDT. BIF name looks more similar to %SUBST which is used to extract a portion of string by passing from and two positions of the original string. Instead, We would need to pass a value (i.e., Date, Time or Timestamp ) and Unit (i.e., *YEARS, *MONTHS, *DAYS, *HOURS, *MINUTES, *SECONDS or *MSECONDS) to %SUBDT.  Valid unit should be passed for the type of the value passed. Below are the valid values for each type. Date - *DAYS, *MONTHS, *YEARS Time - *HOURS, *MINUTES, *SECONDS Timestamp - *DAYS, *MONTHS, *YEARS, *HOURS, *MINUTES, *SECONDS, *MSECONDS Syntax: %SUBDT(value : unit { : digits { : decpos} }) Value and Unit are the mandatory arguments.  Digits and Decimal positions are optional and can only be used with *SECONDS for Timestamp. We can either pass the full form for the unit or use the short form. Below i