SQL Substring Function Examples with T-SQL, R and Python

By:   |   Updated: 2022-01-24   |   Comments (2)   |   Related: > Functions System


Problem

String manipulation is one of the most fundamental data manipulation items used in almost every data processing exercise. Generally, string manipulation objectives are formatting, trimming, padding, replacing, casting and similar functions. To apply these functions, often one needs to select specific parts of the string, and then apply these transformations. In Microsoft SQL Server, there are three languages that are often used in conjunction with each other – T-SQL, R and Python. All these programming / query languages offer string selection using substring or equivalent function / operators. In this tip we will learn how to use the substring function for string selection in all these three languages.

Solution

The substring function or equivalent operators in languages like T-SQL, R and Python enables selecting portions of a string and we will show how this can be done using T-SQL, R and Python when using SQL Server.

SQL Server T-SQL Substring Function

First, we will start with the substring function in the T-SQL language. The syntax of substring function is shown in the following example.

SELECT SUBSTRING ( expression , start , length )  			

The expression means the actual string, field name or a variable of character data type. The start parameter is the starting position and length is the total number of characters of the string to be selected.

In the following queries, the first example selects the part of the staring that starts at position 1 with a length of 5 characters from the index starting position. This means that from the input string "hello world", "h" is the first character in the string and five characters from "h" in the first position would end at "o". So, the function extracts the string "hello". The second example starts the selection from position -2, and the length parameter is 5, which will have the equivalent of starting at position 1 with a length of 3 and the output would be "he" as seen below.

Substring Function in T-SQL

We will now try some examples of using the substring function in the WHERE clause of a SELECT query. We can use the output from sys.tables as shown below. Consider that we want to select all the table names that have the word "Pro" in it.

Substring Function in T-SQL

We can use the substring function in the where clause and use the substring function as part of the filter criteria. In this substring function, one way to include the criteria is to mention the expression as the "name" field, start position as 1 and length as 3 as we know the word "Pro" is 3 characters long. As the criteria must be Boolean in the WHERE clause, we will use the substring function as stated and try to match it with the word "Pro". Execute the below query and the output would be all the table names that starts with the word "Pro".

Substring Function in T-SQL

The above result is only partially correct, as we get all the table names that starts with "Pro", but there are also possibilities where the word "Pro" would be in the middle or the end of the string. This brings up a situation where we need to find the start position of the word "Pro" dynamically. We can use the charindex function for this purpose. This function takes the string to search for as the first argument, the field / variable / value in which to search the string for as the next argument, and the start position as the third optional argument. Modify the SELECT query as shown below, and this time you will find all the table names having the word "Pro" in it.

Substring Function in T-SQL

Substring Function in R

Now let’s understand how substring works in the R programming language. R offers substring and substr functions that have equivalent functionality of the substring function in T-SQL. You can execute R code in T-SQL using the sp_execute_external_script stored procedure.

In the below examples, the first example demonstrates the use of the substr function which works exactly as the substring function in T-SQL. You can also use substring function, which takes the string, start position and end position as the arguments. You can use ranges for the start position and end position parameter.

In the second example, we are specifying the range of 1 to 5 for start position as well as the end position, and the result would be as shown below. You would get every letter as the output for the first five characters of the string, as the parameters would be executed as start position 1 and end position 1, start position 2 and end position 2 and so on. So, the output length is always 1 character.

If we slightly modify the code, and keep the start position constant and the end position as a range, the output would be in the order of increasing length as shown in the results below.

Substring Function in R

Substring Function in Python

Now let’s look at how to perform substring equivalent functionality in Python programming language.

String in python is treated as an array of Unicode characters. Python does not have a datatype equivalent of characters, as strings are treated just as arrays. Arrays in Python are accessed using ordinal positions surrounded by square brackets. So, let’s understand how to perform array operations on string literals.

In the first line of code, we are creating a string literal by assigning it a string value. In the next line of code, we are accessing the first character of the string literal by accessing the array element at position 1, which is the first letter of the string. If we use negative position, for example -4, then it would start the selection from the right side of the string. If you analyze the result for the third line of the code, you can see how negative ordinal position works.

Let’s say that we want to use it like a substring function where we want to specify the start and end position, in that case we can provide a range for the array element. In the fourth line of code we are providing the range of 1 to -4, means the selection would start from the first element and end at the fourth element from last. Analyze the result now with this logic, and you would understand how the selection is extracted from the original string.

Let’s consider another example, where we want to select every other 3rd character in a string. You can use the double colon operator in the array and mention the intervals as shown in the last line of the code, which is 3 in our case and the output would be a string with a selection of every third character as shown below.

Substring Function in Python

In this way, you can perform string selection operations in different languages using the substring function or its equivalent with nearly similar syntax and / or parameters for string selection.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Siddharth Mehta Siddharth Mehta is an Associate Manager with Accenture in the Avanade Division focusing on Business Intelligence.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2022-01-24

Comments For This Article




Friday, May 31, 2019 - 12:06:56 PM - Douglas Back To Top (81279)

It appears that when using a negative start location, the length is applied as if the string started at the negative location. In your example of starting at -2, the resulting length at a start location of 1 would be 2, not 3.


Friday, May 31, 2019 - 10:34:13 AM - Jim Back To Top (81275)

Hey,

Regarding the Python segment, you say:

"In the next line of code, we are accessing the first character of the string literal by accessing the array element at position 1, which is the first letter of the string."

You're actually accessing the second character of the string, as I believe in Python the first would be accessed using '0'. The code output seems to reinforce this too with it displaying 'i' rather than 'S'.

Cheers















get free sql tips
agree to terms