String Manipulations

  • Introduction

    A “string” is a static (as opposed to variable) set of characters not intended for mathematical manipulation or variable assignment. The following are examples of strings: “Homer likes donuts.”, “Bart is 10 years old.”, “Simpson, Marge”, “123 Evergreen Terrace”. Often, during the ETL stage of the BI stack, we need to clean received data. This is the “transform” part of ETL. It is not uncommon for us to receive data in an inconvenient format, or at least in a format different from how we will implement it. To remedy this, we need to clean the data by splitting parts in pieces, shaving off pieces, and combining pieces, among other things. In this section we will introduce you to the following to make you more powerful data cleaners: Len, Count, Left, Right, Mid, Split, UCase, LCase, & (concatenate), and Replace .

  • Len and Count

    Len

    Sometimes we need to know the length (number of characters) in a string. Len and Count allow us to do this. For example, Len("Homer likes donuts.") will return the value 19, which is the exact number of characters in the string, including spaces and punctuation. This can be used dynamically as well. Let’s say the ActiveCell contains the phrase: “Homer likes donuts.” We could get the number of characters in this cell by using the following code: Len(activecell). This will also return the value 19.

    Count

    Count can also be used for this. For example, the code ActiveCell.Characters.Count will return the number of characters of the ActiveCell (including spaces and punctuation). Count can be used for other things as well. For example, if we want to know how many objects (in this case cells) are in a selected range, we might use the following code: Selection.Count. If we want to know how many columns are in the CurrentRegion of the ActiveCell, we could use the following code: ActiveCell.CurrentRegion.Columns.Count

    The same could be done with rows: ActiveCell.CurrentRegion.Rows.Count

    Count can be used anytime you need to count something. Fancy that.

  • Left and Right

    Left and Right give us the left or right number of characters of a specified string. This can be useful when you have a string that starts with a fixed length code. For example: “ID50056Chopsticks”, “ID00332Soy Sauce Packet”, “ID11252Wok”. If we just want the ID, but not the description, then we could use the following code: Left(ActiveCell, 7). If “ID50056Chopsticks” is in the ActiveCell, then this code will return “ID50056” – the first 7 characters starting from the left.

    We can do the same with Right. Let’s say we have a set of social security numbers, and we only want to keep the last four digits. We could use code like this: Right(ActiveCell, 4)

    If the ActiveCell contained this value: “123456789”, then the code would return: “6789”.

    Let’s get complicated. Let’s say we want to take just the description from the Left example above. This means we need to remove the left 7 characters. One way to do this would be to use the following code: Right(ActiveCell, Len(ActiveCell)-7). This code returns all but the left 7 characters of the ActiveCell. This way, no matter how long the description, it will always give us the full description and nothing else. However, there is actually a simpler way to do this one using Mid.

  • Mid and Split

    Mid

    To do the above example using Mid , you would use the following code: Mid(ActiveCell,8). This returns the remaining portion of the string after starting at the 8 th position. If the ActiveCell contains: “ID50056Chopsticks”, then this code returns “Chopsticks”. Mid can also be used to just take out the middle of a string. For example, if we only want the numeric portion of the ID, we could use the following code:Mid(ActiveCell,3,5).This code returns “50056” – i.e. the 5 characters starting with the 3rd character in the string “ID50056Chopsticks”.

    Split

    Split is a way to put parts of the string into an array for later use. An array stores a series of objects. In this case, the objects are strings. Let’s say we want to store the first and last name in an array. If the string was “Simpson, Marge”, we could use the following code to store the two parts of the string in an array called "nameArray": nameArray=Split(ActiveCell,", "). In this case, the following code would return “Marge”: nameArray(1). The array starts with the 0th position, so nameArray(0) would return “Simpson”. Split uses a delimiter to split the string. In this example, I have used a comma followed by a space to split the string. The code then removes the delimiter from the string, stores the value preceding it in the 0th position of the array, and then searches for another instance of the delimiter. If no more instances of it exist, then it places the remaining portion of the string in the last position of the array.

    Let’s try a more complicated example. Let’s use a string of zip codes separated by commas: “90210,24998,90265,44117,84042”. If we split this using the comma as a delimiter, we will end up with an array of five zip codes. This code will store it in an array called "zipArray": zipArray=Split(ActiveCell,","), and then this code will return one of its values: zipArray(3) – in this case, “44117”. Remember, the array starts in the 0th position, so asking for the one in the 3rd position will return the 4th value.

  • LCase and UCase

    There are times when you’ll want to either format text as upper or lower case, or when you’ll want to do a string comparison while ignoring potential case differences. These are fairly simple situations. To make the string in the ActiveCell lower case, use the following code: LCase(ActiveCell). To make a string upper case, use the following code: UCase(ActiveCell). As with all the examples in this section, ActiveCell could be replaced with any string from a variable, object value, or text typed into the parentheses.

    To make a comparison between strings while ignoring case differences, simply LCase them or UCase them. This will assume all lower or upper case for the strings, and thereby ignore case. For example, if you have asked a user to type in a response as “Y” for yes or “N” for no, they might type “Y” or “y”, “N” or “n”. In VBA, string comparisons are case sensitive, so if we have a conditional set up as:If userResponse = “Y” Then…, if the user has entered a lower case y, then this will evaluate to FALSE. To fix this, we could change the code to: If UCase(userResponse) = “Y” Then… This way, whether the user has entered “Y” or “y”, it will be comparing the upper case version to “Y”.

  • Concatenation

    Concatenation is a fancy word for ‘sticking things together’. Concatenating these two strings: “Lisa” “Simpson” would return “LisaSimpson”. Concatenating these three strings: “Lisa” “ is” “ smrt!” would return “Lisa is smrt!” The concatenation symbol is “&”. To concatenate the previous example, we would use the following code: “Lisa” & “ is” & “ smrt!” We can also concatenate variable and object values with strings. Let’s say we have three variables: fname, lname, and age. We could construct a sentence using the following code: fname & “ “ & lname & “ is “ & age & “ years old.”If fname=”Lisa”, lname=”Simpson”, and age=8, then the code would return: “Lisa Simpson is 8 years old.” In a simple point of sales system, we might concatenate the line item description, quantity and price as follows: description & “ ($” & price & “ each)” & “ x” & quantity & “ ---------- $“ & price * quantity. This might return: “Candy Canes ($0.25 each) x12 ---------- $3.00”.

  • Replace

    Another useful string manipulation is the find and replace function. Let's say you have a set of social security numbers with hyphens in them, but you want to remove those hyphens (e.g., go from 123-45-6789 to 123456789). The code to do this is fairly simple: Replace(activecell.value,"-",""). This code finds any hyphen characters in the activecell and replaces those hyphens with nothing (empty quotes). Here is another example. If we want to replace all spaces with underscores in a set of names, the code would look like this: Replace(activecell.value," ","_"). This would turn a name like "Homer J Simpson" into "Homer_J_Simpson".

  • Practice

    Here is a video to demonstrate putting all of these string manipulations together (except Replace). Here are the workbooks used in the video and the solution.