目次
1. Introduction
Python is a popular programming language used in a wide range of applications because of its simple syntax and ease of use. In particular, string manipulation is one of the fundamental skills you cannot avoid when learning Python, and it plays an important role in many scenarios such as data analysis, text processing, and log analysis. Splitting strings is essential for data preprocessing and organization. Python has thesplit()
method, a convenient feature that helps split strings using a single delimiter. However, depending on the type and format of the data, you may need to handle multiple delimiters at once. The standard split()
cannot address this, which can complicate data processing. In this article, we will explain how to split strings using multiple delimiters in Python. Specifically, we will cover the following topics:- Basics and of the
split()
method - Flexible splitting techniques using regular expressions
- Practical examples such as CSV data processing and log analysis
- Tips and best practices for writing efficient and error‑resistant code

2. Basics and Limitations of the split() Method
When splitting strings in Python, thesplit()
method is the most basic approach. This method is very easy to use, and even beginners can handle it intuitively. However, because of its simplicity, there are some limitations. In this section, we will explain in detail the basic usage of the split()
method and its limitations.Basic Usage of the split() Method
Thesplit()
method splits a string by a specified delimiter and returns a list. Below is a basic usage example.# Split a comma-separated string
text = "apple,banana,grape"
result = text.split(",")
print(result)
# Output: ['apple', 'banana', 'grape']
In the code above, a comma (,
) is specified as the delimiter. The string is split at commas, and each part is returned as a list.Default Behavior
If no delimiter is specified,split()
uses whitespace characters (spaces, tabs, newlines, etc.) as the default delimiter. Consecutive whitespace is treated as a single delimiter, making it convenient for handling formatted text.# Use whitespace by default
text = "Hello Python World"
result = text.split()
print(result)
# Output: ['Hello', 'Python', 'World']
Limitations of the split() Method
Thesplit()
method is convenient, but it has several important limitations.- Can only specify a single delimiter
- In
split()
, you can specify only one delimiter. Therefore, it is unsuitable when you want to handle multiple different delimiters at once.
# When you want to split by both commas and semicolons
text = "apple,banana;grape"
result = text.split(",")
print(result)
# Output: ['apple', 'banana;grape'] → cannot handle semicolons
- Does not support regular expressions
- It cannot perform flexible splitting based on specific patterns (e.g., consecutive spaces or particular symbols).
- May include empty elements
- If delimiters appear consecutively, the result may include empty elements.
# When delimiters appear consecutively
text = "apple,,banana"
result = text.split(",")
print(result)
# Output: ['apple', '', 'banana']
Next Steps
To overcome these limitations, using Python’s regular expression module (re
) is effective. In the next section, we will explain, with concrete examples, how to flexibly split using multiple delimiters with regular expressions.
3. Splitting with Multiple Delimiters Using Regular Expressions
By using Python’sre
module, you can split a string by specifying multiple delimiters. Leveraging this feature allows you to flexibly handle complex cases that split()
cannot address.Basic Method for Using Regular Expressions
To use regular expressions, import Python’sre
module and use the re.split()
function. This function splits a string based on the specified regular expression pattern.import re
# Specify multiple delimiters
text = "apple, banana; grape orange"
result = re.split(r"[,s;]+", text)
print(result)
# Output: ['apple', 'banana', 'grape', 'orange']
- Structure of the Regular Expression:
[,s;]
: matches any of a comma (,
), a space (s
), or a semicolon (;
).+
: treat one or more consecutive occurrences as a single delimiter.
Example Application: Splitting with Complex Patterns
It is also possible to split based on specific numbers or symbols.# Split on digits
text = "apple123banana456grape789"
result = re.split(r"d+", text)
print(result)
# Output: ['apple', 'banana', 'grape', '']
d+
: one or more consecutive digits.
Performance Considerations
Regular expressions are extremely flexible and convenient, but overusing complex patterns can slow down processing. In particular, when handling large datasets, be sure to use only the minimal necessary patterns. In the next section, we will explain concrete examples of real-world data processing and log analysis. Through these examples, you will learn how to leverage regular expressions andsplit()
.
4. Learning String Splitting Through Concrete Examples
Here, we present concrete use cases of splitting strings with multiple delimiters in Python. We’ll explain techniques that are extremely useful for real-world data processing and analysis, based on the following three scenarios.Processing CSV Data: Handling Multiple Delimiters
CSV (Comma-Separated Values) is a fundamental format for data processing, but sometimes delimiters other than commas appear. In such cases, using regular expressions allows flexible handling.import re
# Data containing multiple delimiters
data = "apple, banana;grape orange"
result = re.split(r"[,s;]+", data)
print(result)
# Output: ['apple', 'banana', 'grape', 'orange']
- Explanation of Regular Expressions:
[,s;]+
: Specify commas (,
), spaces (s
), and semicolons (;
) as delimiters.+
: Handle consecutive delimiters as a single separator.
Log File Analysis: Flexible Data Splitting
Log data often contains dates, log levels, messages, and other elements intermingled. Let’s see how to use multiple delimiters to format this into a parsable structure.import re
# Sample log data
log = "2024-12-15 12:34:56 INFO: User logged in"
# Split the date, time, log level, and message
result = re.split(r"[-s:]+", log)
print(result)
# Output: ['2024', '12', '15', '12', '34', '56', 'INFO', 'User', 'logged', 'in']
- Explanation of Regular Expressions:
[-s:]+
: Specify hyphens (-
), spaces (s
), and colons (:
) as delimiters.- As a result, the log data is split into individual elements, making analysis easier.
Text Data Cleansing: Removing Unwanted Symbols
In preprocessing text data, it’s often necessary to delete unwanted symbols and extract only the important words. Below is an example.import re
# Sample text data
text = "Hello!! Welcome@@ to ##Python*** Programming."
# Remove specific symbols and split
result = re.split(r"[!@#*s]+", text)
print(result)
# Output: ['Hello', 'Welcome', 'to', 'Python', 'Programming', '']
- Explanation of Regular Expressions:
[!@#*s]+
:!
,@
,#
,*
, and spaces (s
) as delimiters.- With this approach, you can efficiently remove unwanted symbols from the data.
Performance Comparison: split() vs re.split()
In practical scenarios, processing speed also matters. Let’s compare the performance of regex-based splitting (re.split()
) and simple splitting (split()
).import re
import time
# Sample data
data = "apple banana grape orange " * 100000
# Processing time for split()
start = time.time()
result = data.split(" ")
end = time.time()
print(f"split() time: {end - start:.5f} seconds")
# Processing time for re.split()
start = time.time()
result = re.split(r"s+", data)
end = time.time()
print(f"re.split() time: {end - start:.5f} seconds")
- The results depend on data size and delimiter complexity, but for simple splitting,
split()
is faster. Conversely, when flexibility is required,re.split()
is effective.

5. Cautions and Best Practices
When splitting strings with multiple delimiters in Python, you need to be careful to avoid errors and performance issues. This section explains the correct implementation methods and best practices for writing efficient code.Cautions
1. Pay attention to the structure of regular expressions
- When using regular expressions, it is important to verify that they operate as intended. Overly complex regular expressions reduce code readability and can cause bugs.
import re
# Overly complex example
pattern = r"[,s;]|(?<=w)(?=[A-Z])"
text = "apple, banana;GrapeOrange"
result = re.split(pattern, text)
print(result)
# Output: ['apple', 'banana', 'Grape', 'Orange']
- Solution: Aim for simple regular expressions
# Simple pattern
pattern = r"[,s;]+"
text = "apple, banana; grape orange"
result = re.split(pattern, text)
print(result)
# Output: ['apple', 'banana', 'grape', 'orange']
2. Consider performance
- While regular expressions are flexible, they can be slower. This is especially with large datasets or real‑time processing.
3. Remove empty elements
- When multiple delimiters appear consecutively, empty elements may appear in the result. Leaving them as‑is can affect data processing.
import re
text = "apple,,banana,,grape"
result = re.split(r",", text)
print(result)
# Output: ['apple', '', 'banana', '', 'grape']
# Remove empty elements
cleaned_result = [x for x in result if x]
print(cleaned_result)
# Output: ['apple', 'banana', 'grape']
4. Escape special characters
- In regular expressions, certain characters (e.g.,
.
,*
,+
,?
) have special meanings, so they must be escaped when used as simple delimiters.
import re
# Use a period as the delimiter
text = "apple.banana.grape"
result = re.split(r".", text)
print(result)
# Output: ['apple', 'banana', 'grape']
Best Practices
1. Pursue simplicity
- Whenever possible, in scenarios that can be handled with a simple
split()
method, it is more efficient not to use regular expressions.
2. Add comments to regular expressions
- Add comments to regular expressions to make them easier for other developers—or your future self—to understand.
import re
# Use commas, spaces, and semicolons as delimiters
pattern = r"[,s;]+"
text = "apple, banana; grape orange"
result = re.split(pattern, text)
3. Consider edge cases
- Account for the possibility of empty strings or specially formatted inputs by adding exception handling and data‑cleaning steps to your code.
import re
def safe_split(text, pattern):
if not text:
return [] # Return an empty list for an empty string
return re.split(pattern, text)
result = safe_split("", r"[,s;]+")
print(result)
# Output: []
4. Verify performance
- If the same task can be achieved in multiple ways, perform timing tests or similar to determine which is more efficient.
5. Introduce unit tests
- When using complex splitting logic, create unit tests to ensure changes don’t affect other behavior.
import re
def test_split():
text = "apple, banana;grape orange"
result = re.split(r"[,s;]+", text)
assert result == ["apple", "banana", "grape", "orange"]
test_split()
By following these cautions and best practices, you can perform string splitting with multiple delimiters efficiently and safely.
6. Summary
In this article, we covered the basics to advanced techniques of string splitting using Python, focusing especially on handling multiple delimiters. Below is a summary of the key points from each section.Review of Key Points
- Fundamentals and Limitations of the split() Method
- The
split()
method is the basic way to split a string using a single delimiter, but it cannot handle multiple delimiters or complex patterns.
- Flexible Splitting Using Regular Expressions
- By using Python’s regular expression module (
re
), you can split using multiple delimiters or specific string patterns. - Regular expressions are extremely powerful and well-suited for complex data processing.
- Concrete Use Cases
- Through practical examples such as cleaning CSV data, log analysis, and preprocessing text data, we learned how to apply these techniques.
- Choosing methods with performance in mind is also a crucial skill in real-world work.
- Cautions and Best Practices
- Keeping regular expressions simple and handling edge cases properly to prevent errors are key to efficient coding.
- It’s also important to develop the habit of benchmarking performance and selecting the optimal approach.
Next Steps
String manipulation in Python is a fundamental skill for data analysis and text processing. Advancing through the following next steps will deepen your expertise:- Further Learning of Regular Expressions
- By learning advanced regex features (e.g., grouping, negative matching), you can handle even more complex data processing.
- Practical Application
- Actively apply the skills learned in this article to your daily data processing and software development tasks.
- Pursuing Automation and Efficiency
- Develop the habit of writing high-quality code through unit testing and code reviews.
When This Article Is Useful
- When data cleaning or preprocessing is required.
- Projects that involve analyzing system logs or CSV data.
- Scenarios that prioritize performance and code maintainability.