How to Split a String in Python
How to Split a String in Python
Jan 26, 2022
Python Programming

How to Split a String in Python

A string is a sequence of characters that represent text. In programming, strings are often used to store text that will be displayed to the user or another computer.

Characters represent a symbol, but computers can only store binary, 1s and 0s. When a character is represented as a symbol on the screen, the program must encode the 1s and 0s. Python uses Unicode to encode any string declared with quotes.

To easily split a string in Python, just use the split() method. Here is a simple example to split a sentence into a list, with each word becoming a separate in the list.

txt = "This is a string" #declare string value

print(txt) #print the string

x = txt.split() #assign variable x to separated string value

print(type(x)) #print the split string to the console

 

The console output would be as follows:

This is a string

['This', 'is', 'a', 'string']

Let’s take a closer look at how to split a string in Python and see more examples like this one.

Brief Introduction to Splits

The split() method takes a string and splits it into a list of substrings based on a set of delimiters. This method is a pre-built type in Python and is easy to use. It has two parameters, sep and maxsplit:

str.split(sep=None, maxsplit=- 1) 

Here, “sep” stands for separator or delimiter. This value defaults to whitespace if left blank or set to None. Delimiter characters are characters that are used to separate the substrings in the string. There can be multiple delimiters, and any character can be specified as the chosen delimiter. 

On the other hand, “maxsplit” specifies how many splits are set to occur. If maxsplit is not set, it defaults to -1. This value means that there are no limits to the number of splits that are set to occur.

Splitting text strings is useful in data analysis to help users analyze collected data. Commas, colons, spaces, and quotation marks are often selected as delimiters.

How to Split a String in Python 

There are various types of splits that can be performed in Python, and there are different methods aside from split() that can split a string:

  • str.rsplit(sep=None, maxsplit=- 1) - rsplit() functions the same as split() but starts from the right.
  • re.split(pattern, string, maxsplit=0, flags=0) splits a string with regular expression, similar to operations found in Perl.
  • str.splitlines([keepends]) splits by line in a string.
  • a[start:stop:step] is a slice notation that can be used to iterate over strings and split strings.

How to Split a String 

A Comma Separated Values file, more commonly known as a .csv file, is a plain-text file that contains data separated by commas. These types of files are commonly seen in data aggregation and collection.

Splitting each value delimited by a comma can help analyze these data sets. Use the below example code to declare a string and separate each value in the list at each comma.

txt = "Abc,De,F" #declare string value

print(txt) #print the string before separating

x = txt.split(",") #assign variable x to separated string value and set sep to ','

print(x) #print the comma separated list

 

Here is the console output:

Abc,De,F

['Abc', 'De', 'F']

Please note, the commas in the console output are not part of the values in the string. The commas have been removed from the string and each value is contained separately in the list denoted by the variable x.

A list is a sequence type in Python. The two other sequence types are tuple and range. There are six principal built-in types in Python:

  • Numerics
  • Mappings
  • Classes
  • Instances
  • Exceptions
  • Sequences

However, the main types used in string manipulation are string, list, and tuple. 

How to Split a String with No Arguments 

If no parameters or arguments have been specified when calling the split() method, the parameters will default to sep=None and maxsplit=-1. This means that the separator or delimiter will be set to whitespace and splits will occur until the end of the string. See the below sample code:

txt = "This is a sentence." #declare string value

print(txt) #print the string

x = txt.split() #assign variable x to separated string value, sep and maxsplit are not assigned values

print(x) #print the list separated at each whitespace

 

Console output will be as follows:

This is a sentence.

['This', 'is', 'a', 'sentence.']

Notice how each word is separated in the list when a space is encountered in the original string. As well, the period is included in the last list item because there wasn’t any whitespace and the program was not instructed to remove periods.

How to Split a String with a Limited Number of Splits 

The other parameter or argument allowed when calling the split() method is maxsplit. This parameter identifies how many iterations of the split() method are performed. See the below example with the maxsplit set to 1 and the delimiter set to the default value.

txt = "This is a sentence." #declare string value

print(txt) #print the string

x = txt.split(None,1) #assign variable x to separated string value, sep is assigned to None and maxsplit is assigned to 1

print(x) #print the list separated at each whitespace

 

The console output is below:

This is a sentence.

['This', 'is a sentence.']

Notice how only 1 split occurred, so there are only two items in the list. “This” is one value in the list and “is a sentence.” is one value in the list. Let’s test the maxsplit set to a value of 2.

txt = "This is a sentence." #declare string value

print(txt) #print the string

x = txt.split(None,2) #assign variable x to separated string value, sep is set to None and maxsplit to 2

print(x) #print the list separated at each whitespace

 

Console output:

This is a sentence.

['This', 'is', 'a sentence.']

Notice now we have 3 items in the list and 2 splits occurred. What if we set the maxsplit to something greater than the number of splits needed to pass through the inputted string?

txt = "This is a sentence." #declare string value

print(txt) #print the string

x = txt.split(None,10) #assign variable x to separated string value, sepsis set to None and maxsplit to 10

print(x) #print the list separated at each whitespace

 

Console output:

This is a sentence.

['This', 'is', 'a', 'sentence.']

Notice, we do not receive an error if maxsplit is greater than the number of splits available for the inputted string.

Additional Examples of a Split String 

While Python has the split() method available to quickly split strings, there are a variety of other ways to split strings and different use cases for the split() method.

How to Split a String in Python to Create a Word Counter 

To create a word counter in Python, use split() and len() combined. The method len() can be used to return the length of objects and count the number of items in an iterable object. 

string = "count the words in this sentence using len() and split()" #declare the string variable

print(string) #print the string to the consol

wordcount = len(string.split()) #use len() after using split() on the string

print("Word count: " +str(wordcount)) #print the word count to the console

 

Console output:

Count the words in this sentence using len() and split()

Word count: 10

Note how you can call len() and split() on the same line. These methods can be called separately but calling both on the same line uses fewer lines of code and still provides readability.

How to Split Lines from a Text File in Python 

To split lines from a text file, import the file in Python. For this example our .txt file contains the below random car data corresponding to make, model, year, and a randomly generated VIN.

Volkswagen,Cabriolet,1991,JH4DC53874S439387

Honda,Civic,1992,3GTU2YEJ9DG224736

BMW,8 Series,1993,3N1CN7AP5EL329941

Pontiac,Vibe,2007,1G6DE5E54D0848444

Save this as a .txt file and use the below to import the .txt file and use splitlines().

with open("car_data.txt",'r') as data: #open the file in read mode designated by selecting the parameter ‘r’

    file = data.read().splitlines() #call the read() method to read the contents of the car_data file and then splitlines() to create a new item after each line

print(file) #print the file to the console

 

Console output:

['Volkswagen,Cabriolet,1991,JH4DC53874S439387', 'Honda,Civic,1992,3GTU2YEJ9DG224736', 'BMW,8 Series,1993,3N1CN7AP5EL329941', 'Pontiac,Vibe,2007,1G6DE5E54D0848444']

Another example when declaring the string can be found below:

car_data = ("Volkswagen,Cabriolet,1991,JH4DC53874S439387 \n Honda,Civic,1992,3GTU2YEJ9DG224736 \n BMW,8 Series,1993,3N1CN7AP5EL329941 \n Pontiac,Vibe,2007,1G6DE5E54D0848444") #declare string using \n to signify a line break

x = car_data.splitlines() #declare the variable x as the result of calling splitlines()

print(x)

 

Console output:
['Volkswagen,Cabriolet,1991,JH4DC53874S439387 ', ' Honda,Civic,1992,3GTU2YEJ9DG224736 ', ' BMW,8 Series,1993,3N1CN7AP5EL329941 ', ' Pontiac,Vibe,2007,1G6DE5E54D0848444']

How to Split a String with Multiple Delimiters 

To split a string with multiple delimiters, import the “re module,” which stands for “regular expressions.” More information can be found in the Python documentation

import re

text = "Split, this sentence. with; the chosen, characters." #declare the string variable

print(re.split("[;,.] ", text)) #print the split string

 

Console output:

['Split', 'this sentence', 'with', 'the chosen', 'characters.']

Notice the square brackets are used to denote the set of characters that will be used to split. There is a trailing space as well, so the sentence will be split only when either a semicolon, comma, or period with a trailing space is encountered.

How to Split a String Into an Array of Characters 

Simply use the list() method to split a string into an array of characters.

text = "Python"

char_list = list(text)

print(char_list)

 

Console output:

['P', 'y', 't', 'h', 'o', 'n']

How to Split a String in Python Using Substrings 

We can use slice notation to split a string in Python. Slice notation uses the following parameters:

a[start:stop:step]

 

To get the first two characters of a string, use the following method:

text = "Python"

print(text[:2]) #indicate the stop as 2 to get the first two characters of a string

 

Console output:

Py

To get the last two characters of a string, use the following method:

text = "Python"

print(text[-2:])

 

Console output:

on

Using slice notation is a powerful feature in Python and has the benefit of being more efficient than other methods. 

Wrapping It Up

Splitting a string in Python is a great way to learn about programming in that language. There are also real-world applications for splitting a string, primarily in data analysis where it is useful to extract information from .csv or .txt files.

Python can be used for a variety of purposes, from web development to scientific programming. While other languages may be better suited for specific tasks, Python is a jack-of-all-trades that can do almost anything.

View Available Computer Science Tutors