Skip to main content

What is a Hash Function? A quick look into different Hash Functions

Hash Functions

Hash functions have become essential to maintain data integrity, security and authenticity. The functionality of a hash function is to accept the input data (could be a simple message, file or data in any other format) and to generate a string with fixed length of characters (which is called as Hash). Technically, Hash functions are more similar to mathematical algorithms. 

This hash value is unique for the specific input, Any simple change in the input would result in generating completely a different hash. This makes the hash functions essential for different applications such as password storage, digital signatures, message authentication and data integrity verification. 

Below are some of the key properties that every hash function should contain. 
  • Deterministic - Hash function should always return the same hash for the same input. 
  • Fixed Output Size - Hash function should return a hash which is of fixed length, irrespective of the input size. 
  • Preimage Resistance - Original input should never be able to be determined any sort of computation on the hash. 
  • Collision Resistance - Hash function should ideally not generate the same hash for two different inputs. 
There are different hash functions available, and one has to consider several factors based on the need while determining which hash function to be used. 
  • Security - Hash function should be able to withstand any preimage attaches and/or collision attacks. 
  • Performance - Hash function should be fairly faster even for the large amounts of input data. Both Security and performance are to be considered. 
  • Standardization and Interoperability - Hash function should undergo considerable amount of analysis, scrutiny and standardization processes. To ensure interoperability and compatibility with various existing applications and cryptographic protocols. 
Below are the various Hash functions available. 
  • MD5 (Message Digest Algorithm 5)
  • SHA-1 (Secure Hash Algorithm 1) 
  • SHA-256 (Secure Hash Algorithm 256-bit)
  • SHA-3 (Secure Hash Algorithm 3)
There are many other hash functions available apart from the ones listed above. Let's have a look at each of these hash functions in brief. 

MD5 (Message Digest Algorithm 5)

MD5 is one of the popular hash function developed in 1991 by Ronald Rivest. This function produces a hash value of 32 character hexadecimal value (128 bits). MD5 accepts inputs of different sizes and transforms them into fixed size hash values. 

Some of the primary advantages of MD5 algorithm are it's speed and support. 
  • MD5 is considerably fast and efficient, which makes it suitable for application which requires hash generation fairly faster. 
  • MD5 is supported by different programming languages and platforms, this makes it easy to implement. 
There are some disadvantages as well when it comes to using MD5, due to which this algorithm has been not suggested to use for any cryptographic purposes. 
  • Collision vulnerability - It has been proved that MD5 is vulnerable to collision and researchers have demonstrated, where two inputs generated same hash value. 
  • Possibility of preimage attacks - It is suspected to be possible to get the original message based on the hash value. 
Various other hash functions like SHA-256 and SHA-3 have been developed to address these issues and are suggested to be used. More details on the MD5 hash function can be found here

SHA-1 (Secure Hash Algorithm 1)

SHA-1 was developed in 1995 by National Security Agency (NSA). This function produces a hash value of 40 character hexadecimal value (160 bits). SHA-1 accepts inputs of different sizes and applies a series of mathematical operations to generate the hash value of fixed size. 

Like MD5, SHA-1 also has some security concerns.
  • Collision vulnerability - Researchers have demonstrated that SHA-1 is prone to collision vulnerability by generating same hash value for two different inputs using SHA-1. 
  • With increased availability of computational power, it has become more vulnerable for attacks and considered less secure. 
Even though SHA-1 has considered to be less secure, it is still used in some of the legacy systems. These need to be migrated over to more secure hash functions. 

SHA-256 (Secure Hash Algorithm 256-bit)

SHA-256 is part of the SHA-2 family of hash functions. It was developed in 2001 by National Security Agency. This is more of an upgraded version of SHA-1 and addressing the vulnerabilities present in SHA-1. This function produces a hash value of 64 character hexadecimal value (256 bits). 

There have been significant improvements to SHA-256 compared to it's previous version. 
  • Increased security - Properties of SHA-256 are considerably stronger and is resistant to collision attacks, preimage attacks and other vulnerabilities (which were observed on it's previous version). 
  • Robust Hash Function - It's hash value is unpredictable and provides a high degree of resistance against collision attacks. This makes it suitable for use in various applications.
Due it's security, SHA-256 has been widely used in different domains (such as blockchain technology, digital signatures, password storage and secure communication protocols). 

Even though SHA-256 is considered to be one of the stronger hash function, this is still vulnerable to some of the attacks (some of them has not been proved yet practically. 
  • With the advancement in the cryptanalysis, SHA-256 could potentially be vulnerable to collision attacks like it's previous version (nothing has been proved yet). 
  • With the advancements in quantum computing, they could potentially reach to the maturity level where it could break the mathematical principles SHA-256 relies on. 
There are some more possible vulnerabilities that SHA-256 could be exposed to in future.

SHA-3 (Secure Hash Algorithm 3)

As the name states, SHA-3 is the third iteration of the SHA (Secure Hash Algorithm) family, It was designed by the National Institute of Standards and Technology (NIST). This was designed to address the vulnerabilities found in the older versions. 

Below mentioned are some of the key features of SHA-3 hash function. 
  • Different output sizes - SHA-3 provides a framework that allows generation of different output sizes (like SHA-3-224, SHA-3-256, SHA-3-384, SHA-3-512. last 3 digits represents the size of hash value generated in bits). This offers flexibility to users to select the appropriate hash size as per the requirement. 
  • Security - SHA-3 provides high level of security against the various cryptographic attacks (like collision attacks, preimage attacks and various other known vulnerabilities). 
  • Unlike it's previous version (SHA-2 family), SHA-3 uses a different internal structure called 'Keccak sponge construction'. this allows a high security and resistance to attacks. 
These features of SHA-3 offers enhanced security, performance efficient and makes this a future-proof algorithm. 

SHA-3 has been used in some of the cryptocurrency and blockchain networks, Hardware, IoT security, Digital signatures and Key derivation. 

Other Hash Functions

Below mentioned are the few other hash functions available. 
  • Blake2 - This is considered as one of the high performance hash functions as this provides high level of security and excellent performance. It's simplicity, speed and security makes it suitable for digital signatures, password hashing and checksum verification. 
  • Whirlpool - Whirlpool produces a 128 character hexadecimal hash value (512 bits) and provides high level of security and resistant to various known vulnerabilities. This is commonly used in digital forensics, data integrity checks and secure password storage which requires strong hash function. 
Hash functions available are not limited to the ones named in this post, there are many other hash functions available. 

Key Parameters to consider

There are some key parameters to consider while choosing a hash function. 
  • Security Strength 
  • Performance and Efficiency
  • Standardization and Interoperability
  • Future proofing 
Apart from these, application specific requirements are to be taken into consideration while choosing a hash function. 

I hope this post has provided a good insight on what is a hash function and different hash functions available in brief.

If you have any Suggestions or Feedback, Please leave a comment below or use Contact Form.


Popular posts from this blog

All about READ in RPGLE & Why we use it with SETLL/SETGT?

READ READ is one of the most used Opcodes in RPGLE. As the name suggests main purpose of this Opcode is to read a record from Database file. What are the different READ Opcodes? To list, Below are the five Opcodes.  READ - Read a Record READC - Read Next Changed Record READE - Read Equal Key Record READP - Read Prior Record READPE - Read Prior Equal Record We will see more about each of these later in this article. Before that, We will see a bit about SETLL/SETGT .  SETLL (Set Lower Limit) SETLL accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Corresponding Record (or Next Record if exact match isn't found).  SETGT (Set Greater Than) SETGT accepts Key Fields or Relative Record Number (RRN) as Search Arguments and positions the file at the Next Record (Greater Than the Key value). Syntax: SETLL SEARCH-ARGUMENTS/KEYFIELDS FILENAME SETGT  SEARCH-ARGUMENTS/KEYFIELDS FILENAME One of the below can be passed as Search Arguments. Key Fiel

What we need to know about CHAIN (RPGLE) & How is it different from READ?

CHAIN READ & CHAIN, These are one of the most used (& useful) Opcodes by any RPG developer. These Opcodes are used to read a record from file. So, What's the difference between CHAIN & READ?   CHAIN operation retrieves a record based on the Key specified. It's more like Retrieving Random record from a Database file based on the Key fields.  READ operation reads the record currently pointed to from a Database file. There are multiple Opcodes that start with READ and all are used to read a record but with slight difference. We will see more about different Opcodes and How they are different from each other (and CHAIN) in another article. Few differences to note.  CHAIN requires Key fields to read a record where as READ would read the record currently pointed to (SETLL or SETGT are used to point a Record).  If there are multiple records with the same Key data, CHAIN would return the same record every time. READE can be used to read all the records with the specified Ke

Extract a portion of a Date/Time/Timestamp in RPGLE - IBM i

%SUBDT Extracting Year, Month, Day, Hour, Minutes, Seconds or Milli seconds of a given Date/Time/Timestamp is required most of the times.  This can be extracted easily by using %SUBDT. BIF name looks more similar to %SUBST which is used to extract a portion of string by passing from and two positions of the original string. Instead, We would need to pass a value (i.e., Date, Time or Timestamp ) and Unit (i.e., *YEARS, *MONTHS, *DAYS, *HOURS, *MINUTES, *SECONDS or *MSECONDS) to %SUBDT.  Valid unit should be passed for the type of the value passed. Below are the valid values for each type. Date - *DAYS, *MONTHS, *YEARS Time - *HOURS, *MINUTES, *SECONDS Timestamp - *DAYS, *MONTHS, *YEARS, *HOURS, *MINUTES, *SECONDS, *MSECONDS Syntax: %SUBDT(value : unit { : digits { : decpos} }) Value and Unit are the mandatory arguments.  Digits and Decimal positions are optional and can only be used with *SECONDS for Timestamp. We can either pass the full form for the unit or use the short form. Below i