Skip to main content

SHA-1 Hash Function: What is it and How to Implement?

SHA-1 (Secure Hash Algorithm - 1)

Hash functions are crucial for various security applications. SHA-1 (Secure Hash Algorithm - 1) is one of the widely used hash function. SHA-1 was developed by National Security Agency (NSA) in 1995. This function accepts an input of any size and generates hash value of 40 characters (160 bits). 

Let's get started with the Step-by-Step process of generating 40 character hash value using SHA-1 hash function.

Step - by - Step Process

On a high level steps involved in the process of generating hash value can be summarized as below. 
  • Message Preprocessing
  • Initializing State Variables 
  • Process Message Blocks
  • Compression Function
  • Iterate Through Blocks
  • Final Hash Value
Let's dive deeper to understand each of these steps in detail. 

Message Preprocessing

SHA-1 algorithm works on blocks, each one of size 512 bits. So, first thing to be done is to convert the input message to binary format and ensure that the input message (in binary format) is congruent to 448 modulo 512. To make it simple, let's break down this into the below. 
  • Converting the input message into binary format (if it is not already binary). 
  • Padding the binary input message to ensure the length of the message is congruent to 448 modulo 512. 
    • Similar to the padding in MD5, '1' bit is added to the end of the message and this '1' bit acts as a delimiter. 
    • Add the '0' bits to the binary input message until the length of the input message is congruent to 448 modulo 512. 
  • Add 64 bits, which indicate the actual length of the input message before any padding.
I'm not deep diving into how the padding works and calculating the number of bits to added, as this has been detailed with examples in my previous post on MD5. This can be found here

Below is the formula for calculating the number of bits to be padded for the quick reference. 

Padding Length = (448 - (Message Length + 1)%512) % 512

Let's find out the importance of 448 here, Whereas SHA-1 actually processes blocks of 512 bits size.

64 bits are to be mandatorily padded at the end of input message to indicate the length of actual input message. '512 - 64 = 448', Which would be the required length of the message after any padding ('1' bit, '0' bits). So, any input message which is congruent to 448 modulo 512 would become a multiple of 512 bits after adding it's length at the end.

Let's break down the above formula. 
  • Message Length: Length of the actual input message (before any padding). 
  • (Message Length + 1): '+1' here is to consider the mandatory delimiter '1' bit. 
  • Subtracting from 448: To get the actual number '0' bits to be padded.

Initialize State Variables

SHA-1 Algorithm uses five state variables, each variable is of size 32 bits (MD5 algorithm uses 4 state variables). These variables are initialized with pre-determined constant values (mentioned below). These state variables are usually referred as H0, H1, H2, H3 and H4. 

H0 = 0x67452301
H1 = 0xEFCDAB89
H2 = 0x98BADCFE
H3 = 0x10325476
H4 = 0xC3D2E1F0

Initial values for the first four state variables are same as the state variables used in MD5 Hash function. Constant initial values for the state variables ensure a consistent starting point for generating SHA-1 Hash. 

SHA-1 algorithm does various calculations on each message block during the hash generation process and at each step, these Stage variables are used to store the intermediate values. 

After processing each message block, intermediate values of the Stage variables would be used as an input for the processing of next block. 

Process Message Blocks

As mentioned earlier, SHA-1 Algorithm works on blocks of size 512 bits. Hash generation process would split the input message (after padding) into blocks of 512 bits before doing any processing (This could be one block of size 512 bits or more depending on the input message).

Each block is processed individually by SHA-1 compressed function. Let's see what is a SHA-1 compressed function in a bit.

The compression function accepts the the message block of 512 bits and state variables as inputs. Compression function then applies a series of logical and bitwise operations on the input (message block and stage variables), this includes bitwise operations (like AND, OR, XOR), modular additions and bitwise rotations. 

These operations ensure the input message block and state variables are mixed up and generate the intermediate hash values which are then stored into state variables. These state variables are used as an input along side processing of next message block. 

Compression Function

Let's have a quick look into what SHA-1 compression function does. On a high level this can be split into the below steps. 
  • Breaking the input block into 16 words. 
  • Extend the 16 words into 80 words. 
  • Perform a series of logical and bitwise operations. 
  • Update state variables with intermediate hash value. 
Let's see what does each step involve in couple of lines. 

Breaking the input block

First thing the compression function would do is to break the input message block of 512 bits into 16 words with each word containing 32 bits. 

Each would then be labelled as W0, W1,...W15 which would be used in the further computation. 

Extending 16 words into 80 words

SHA-1 uses a specific expansion algorithm to expand the 16 words into 80 words. 

SHA-1's expansion algorithm uses a combination of logical and bitwise operations on the 16 words (W0 - W15) to generate the next set of words (W16 - W79). This process endures that the 80 words are mixed sufficiently and are influenced by the original 16 words. 

Perform logical and bitwise operations

The compression function performs a series of logical and bitwise operations on the 80 words generated from the input block and five input state variables (H0, H1, H2, H3 and H4). 
  • Bitwise AND, OR and XOR operations to combine bits from different words and/or state variables based on specific rules. 
  • These words and state variables go through modular addition (addition modulo 2^32) and produce new values.
  • Left bitwise rotations are applies to words and state variables that would shift bits and create new patterns. 

Update state variables

The output of the logical and bitwise operations performed by the compression function would generate intermediate hash value. These intermediate hash values would then be updated to the state variables. 

These state variables would become input for processing of the next message block of 512 bits. 

Iterate Through Blocks

The same process (i.e., Compression function) would be repeated for the each message block of the input message. 

In brief, first block would accept the constant state variables and message block would be input for the compression function which would generate the five intermediate hash values.

These intermediate hash values are then stored in state variables and act as an input alongside the next message block. 

This process would be repeated until the last message block.

Final Hash Value

After all the input message blocks have been processed, five state variables (each of 32 bits) contain the intermediate hash value. 

Concatenating the intermediate hash values from the five state variables to create a single hash value of 40 characters (of 160 bits). 

This 40 characters hash value is the final hash value generated by the SHA-1 hash function and represent a unique digest of the input message. Any small change to the input message would result in generating completely different hash value due to the operations performed at different steps of the input message processing. 

Implementing SHA-1 Hash

How to generate SHA-1 hash value for any given string? Let's look at an example in Python. 

For this example we will use the module 'HashLib'. There is a function 'sha1' in HashLib which accepts the encoded input string and generates Hash value.

1

2

3

4

5

6

7


8

9

10

11

import hashlib


# Accept the input string for generating Hash

input_string = input("Enter string: ")

# Input string needs to be encoded before generating Hash

encoded_input_string = input_string.encode()

# Encoded string would be passed to 'sha1' function from the HashLib module.

hash_value = hashlib.sha1(encoded_input_string)


print(hash_value.hexdigest())



In the above example, 
  • Line - 6: Input string needs to encoded. This can be done using the string method encode(). 
  • Line - 8: Encoded input string to be passed to the function sha1(), this returns the HASH object. 
  • Line - 10: Method 'hexdigest()' can be used on the HASH object to get the Hexadecimal Hash value. 
Below is the sample result. 

Enter string: This is a String

8416c466810d60718d66bfb5e7214af36ce868ab


Any minor change to the input string would result in generating a completely different Hash value. Below is the hash value with the same string by converting 'S' to lower case. 

Enter string: This is a string

f72017485fbf6423499baf9b240daa14f5f095a1



I hope this post has provided a good insight on what is a SHA-1 hash function and how we can generate Hash using SHA-1 algorithm in Python. Please note that SHA-1 is considered to be insecure for cryptographic purposes due to the vulnerabilities identified and more secure hash function (like SHA-256) is recommended. 


If you have any Suggestions or Feedback, Please leave a comment below or use Contact Form.

Comments

Popular posts from this blog

All about READ in RPGLE & Why we use it with SETLL/SETGT?

READ READ is one of the most used Opcodes in RPGLE. As the name suggests main purpose of this Opcode is to read a record from Database file. What are the different READ Opcodes? To list, Below are the five Opcodes.  READ - Read a Record READC - Read Next Changed Record READE - Read Equal Key Record READP - Read Prior Record READPE - Read Prior Equal Record We will see more about each of these later in this article. Before that, We will see a bit about SETLL/SETGT .  SETLL (Set Lower Limit) SETLL accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Corresponding Record (or Next Record if exact match isn't found).  SETGT (Set Greater Than) SETGT accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Next Record (Greater Than the Key value). Syntax: SETLL SEARCH-ARGUMENTS/KEYFIELDS FILENAME SETGT  SEARCH-ARGUMENTS/KEYFIELDS FILENAME One of the below can be passed as Search Arguments. Key Fiel

What we need to know about CHAIN (RPGLE) & How is it different from READ?

CHAIN READ & CHAIN, These are one of the most used (& useful) Opcodes by any RPG developer. These Opcodes are used to read a record from file. So, What's the difference between CHAIN & READ?   CHAIN operation retrieves a record based on the Key specified. It's more like Retrieving Random record from a Database file based on the Key fields.  READ operation reads the record currently pointed to from a Database file. There are multiple Opcodes that start with READ and all are used to read a record but with slight difference. We will see more about different Opcodes and How they are different from each other (and CHAIN) in another article. Few differences to note.  CHAIN requires Key fields to read a record where as READ would read the record currently pointed to (SETLL or SETGT are used to point a Record).  If there are multiple records with the same Key data, CHAIN would return the same record every time. READE can be used to read all the records with the specified Ke

Extract a portion of a Date/Time/Timestamp in RPGLE - IBM i

%SUBDT Extracting Year, Month, Day, Hour, Minutes, Seconds or Milli seconds of a given Date/Time/Timestamp is required most of the times.  This can be extracted easily by using %SUBDT. BIF name looks more similar to %SUBST which is used to extract a portion of string by passing from and two positions of the original string. Instead, We would need to pass a value (i.e., Date, Time or Timestamp ) and Unit (i.e., *YEARS, *MONTHS, *DAYS, *HOURS, *MINUTES, *SECONDS or *MSECONDS) to %SUBDT.  Valid unit should be passed for the type of the value passed. Below are the valid values for each type. Date - *DAYS, *MONTHS, *YEARS Time - *HOURS, *MINUTES, *SECONDS Timestamp - *DAYS, *MONTHS, *YEARS, *HOURS, *MINUTES, *SECONDS, *MSECONDS Syntax: %SUBDT(value : unit { : digits { : decpos} }) Value and Unit are the mandatory arguments.  Digits and Decimal positions are optional and can only be used with *SECONDS for Timestamp. We can either pass the full form for the unit or use the short form. Below i