Get File Names in a Folder with Python (Beginner’s Guide)

目次

1. Introduction

Retrieving filenames from a folder in Python is a very useful skill for beginner to intermediate programmers. Being able to get filenames can streamline bulk data processing and file operations, helping with automation and data organization. In this article, we explain step-by-step how to retrieve filenames from a folder using Python. We include code examples and clear explanations so even first-time learners can follow along.

2. Basic filename retrieval using the os module

If you use Python’s standard library os module, you can easily retrieve file names in a folder. This approach is suitable even for those new to Python, featuring a simple and intuitive code structure.

Basic usage of the os.listdir() function

os.listdir() returns a list of the names of all items (files and folders) in the specified directory.
import os

# Get files and directories in the folder
folder_path = "sample_folder"
items = os.listdir(folder_path)

print("Items in the folder:")
print(items)
When you run the above code, the file and directory names in the specified folder are displayed as a list.

How to extract only filenames

The retrieved list may include directory names. By using os.path.isfile(), you can extract only the files.
import os

# Get only file names in the folder
folder_path = "sample_folder"
files = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f))]

print("Files in the folder:")
print(files)
In this code, each item is checked to see if it is a file, and only the file names are stored in the list.

Example

For example, if you want to create a list of image files saved in a folder, you can use it as follows.
import os

# Get image files
folder_path = "images"
image_files = [f for f in os.listdir(folder_path) if f.endswith((".png", ".jpg", ".jpeg"))]

print("Image file list:")
print(image_files)

Advantages of this method

  • Simple and easy to understand.
  • Can be implemented using only Python’s standard library.
  • Ideal for file operations in small folders.

Notes

  • Hidden files or files with unusual names may be included, so filter as needed.
  • Using it on large directories may affect performance.

3. Using the glob module for wildcard searches

If you use Python’s glob module, you can efficiently retrieve filenames that match a specific pattern. Unlike the os module, it is characterized by the ability to flexibly set filename search criteria using wildcards.

Basic usage of the glob module

By using glob.glob(), you can retrieve files and directories that match the specified pattern.
import glob

# Get all files in the folder
folder_path = "sample_folder"
files = glob.glob(f"{folder_path}/*")

print("Files in the folder:")
print(files)
This code retrieves the paths of all items (files and directories) in the specified folder.

Retrieve files matching a specific pattern

For example, to retrieve only files with a specific extension, write the following.
import glob

# Get files with the specified extension
folder_path = "sample_folder"
text_files = glob.glob(f"{folder_path}/*.txt")

print("Text files:")
print(text_files)
In this example, only files with the .txt extension are retrieved. Similarly, to specify multiple extensions, write as follows.
import glob

# Get files by specifying multiple extensions
folder_path = "sample_folder"
files = glob.glob(f"{folder_path}/*.txt") + glob.glob(f"{folder_path}/*.csv")

print("Text and CSV files:")
print(files)

Retrieving files recursively

By using the recursive=True option of the glob module, you can retrieve files including those in subdirectories.
import glob

# Recursively get all files in subdirectories
folder_path = "sample_folder"
files = glob.glob(f"{folder_path}/**/*", recursive=True)

print("All files in subdirectories:")
print(files)

Examples of wildcard usage

  • *: Represents any string.
  • ?: Represents any single character.
  • [abc]: Represents any character within the brackets.
import glob

# Get any files whose filename starts with "data"
folder_path = "sample_folder"
files = glob.glob(f"{folder_path}/data*")

print("Files starting with 'data':")
print(files)

Advantages and use cases

  • Allows flexible pattern specification and is tailored for filename searches.
  • Compared to the os module, code tends to be more concise.
  • Recursive searches are easy to perform.

Notes

  • An empty list is returned when nothing matches the pattern, so check the results.
  • Recursive searches can return a very large number of files, so apply appropriate filtering.

4. The pathlib module: object-oriented path manipulation

Python’s pathlib module provides a modern, intuitive way to work with file paths in an object-oriented manner. This improves code readability and makes operations simpler.

Basic usage of the pathlib module

Using the Path object from the pathlib module, you can easily obtain the entries in a directory.
from pathlib import Path

# Get items in the folder
folder_path = Path("sample_folder")
items = list(folder_path.iterdir())

print("Items in the folder:")
print(items)
In this code, all items (files and folders) in the folder are returned as a list of Path objects.

Extracting only filenames

To get only files from the directory entries, use the is_file() method.
from pathlib import Path

# Get files in the folder
folder_path = Path("sample_folder")
files = [f for f in folder_path.iterdir() if f.is_file()]

print("Files in the folder:")
print(files)
In this code, directories are excluded and only files are stored in the list.

Filtering by extension

To retrieve only files with a specific extension, specify the extension as a condition.
from pathlib import Path

# Get files with a specific extension in the folder
folder_path = Path("sample_folder")
text_files = [f for f in folder_path.iterdir() if f.is_file() and f.suffix == ".txt"]

print("Text files:")
print(text_files)
The suffix attribute makes it easy to get a file’s extension.

Retrieving files recursively

To include files in subdirectories, use the rglob() method.
from pathlib import Path

# Get all files in subdirectories
folder_path = Path("sample_folder")
files = list(folder_path.rglob("*"))

print("All files in subdirectories:")
print(files)
Additionally, you can recursively search for files that match a specific pattern.
from pathlib import Path

# Get files with a specific extension in subdirectories
folder_path = Path("sample_folder")
text_files = list(folder_path.rglob("*.txt"))

print("Text files retrieved recursively:")
print(text_files)

Benefits and use cases

  • Write intuitive, object-oriented code.
  • Recursive operations and filtering are easy to perform.
  • Available since Python 3.4 and recommended in recent Python versions.

Notes

  • Not available on unsupported Python versions (earlier than 3.4).
  • Be mindful of performance when operating on very large directories.

5. How to Recursively Retrieve Files in Subdirectories

When retrieving files from a folder, you may want to include files in subdirectories and retrieve them recursively. This section explains techniques for recursively obtaining files using the os module and other methods.

Using os.walk()

os.walk() recursively traverses the specified directory and its subdirectories, returning the folder name, subfolder names, and file names.
import os

# Retrieve all files in subdirectories
folder_path = "sample_folder"

for root, dirs, files in os.walk(folder_path):
    for file in files:
        print(f"File: {os.path.join(root, file)}")
Code explanation:
  • os.walk() returns a tuple of (folder path, subfolder list, file list).
  • Uses os.path.join() to construct the full file path.

Collect all files into a list

If you want to store all retrieved files in a list, you can write it like this:
import os

# Store all files in subdirectories in a list
folder_path = "sample_folder"
all_files = []

for root, dirs, files in os.walk(folder_path):
    for file in files:
        all_files.append(os.path.join(root, file))

print("All files in subdirectories:")
print(all_files)

Retrieve only files with a specific extension

For example, if you only want to retrieve .txt files, add the following condition:
import os

# Retrieve files with a specific extension in subdirectories
folder_path = "sample_folder"
txt_files = []

for root, dirs, files in os.walk(folder_path):
    for file in files:
        if file.endswith(".txt"):
            txt_files.append(os.path.join(root, file))

print("Text files:")
print(txt_files)

Recursive retrieval using the glob module

You can easily perform recursive searches using ** from the glob module.
import glob

# Recursively retrieve all files
folder_path = "sample_folder"
all_files = glob.glob(f"{folder_path}/**/*", recursive=True)

print("Files retrieved recursively:")
print(all_files)
It’s also simple to retrieve only files with a specific extension.
import glob

# Recursively retrieve files with a specific extension
folder_path = "sample_folder"
txt_files = glob.glob(f"{folder_path}/**/*.txt", recursive=True)

print("Text files retrieved recursively:")
print(txt_files)

Recursive retrieval using the pathlib module

The rglob() method of the pathlib module provides an intuitive way to retrieve files recursively.
from pathlib import Path

# Recursively retrieve all files
folder_path = Path("sample_folder")
all_files = list(folder_path.rglob("*"))

print("Files retrieved recursively:")
print(all_files)
It’s also easy to specify a particular extension.
from pathlib import Path

# Recursively retrieve files with a specific extension
folder_path = Path("sample_folder")
txt_files = list(folder_path.rglob("*.txt"))

print("Text files retrieved recursively:")
print(txt_files)

Comparison of the methods

MethodCharacteristicsAdvantagesConsiderations
os.walk()Manually implement recursive file traversalHighly customizableCode can become lengthy
globAllows concise wildcard-based expressionsEnables recursive searches with short codeRequires crafting appropriate wildcard patterns
pathlib.rglob()Allows an object-oriented, Pythonic styleHighly readable and fits modern Python codeOnly available on Python 3.4 and later

6. Filtering File Names by Specific Conditions and Use Cases

Filtering files by specific conditions helps organize the list of retrieved file names and is useful when performing data operations for a particular purpose. This section explains how to narrow down files in Python using file extensions, file names, and regular expressions.

Filtering by File Extension

To get only files with a specified extension, you can use list comprehensions or conditional checks. Example: Using the os module
import os

# Get files with the specified extension
folder_path = "sample_folder"
txt_files = [f for f in os.listdir(folder_path) if f.endswith(".txt")]

print("Text files:")
print(txt_files)
Example: Using the pathlib module Using the pathlib module, you can write extension filtering more concisely.
from pathlib import Path

# Get files with the .txt extension
folder_path = Path("sample_folder")
txt_files = [f for f in folder_path.iterdir() if f.suffix == ".txt"]

print("Text files:")
print(txt_files)

When File Names Contain a Specific String

To determine whether a file name contains a specific string, use the in operator.
import os

# Get files whose names contain 'report'
folder_path = "sample_folder"
report_files = [f for f in os.listdir(folder_path) if "report" in f]

print("Files containing 'report':")
print(report_files)

Advanced Filtering Using Regular Expressions

Using regular expressions allows you to filter files with more flexible conditions. For example, it’s useful when searching for specific patterns in file names.
import os
import re

# Get files whose names consist only of digits
folder_path = "sample_folder"
pattern = re.compile(r"^d+$")

files = [f for f in os.listdir(folder_path) if pattern.match(f)]

print("File names consisting only of digits:")
print(files)

Filtering by File Size

If you want to filter by file size, use os.path.getsize().
import os

# Get files that are 1 MB or larger
folder_path = "sample_folder"
large_files = [f for f in os.listdir(folder_path) if os.path.getsize(os.path.join(folder_path, f)) > 1 * 1024 * 1024]

print("Files 1 MB or larger:")
print(large_files)

Practical Examples

To group files by multiple extensions, you can write it like this.
import os

# Categorize files by extension
folder_path = "sample_folder"
files_by_extension = {}

for f in os.listdir(folder_path):
    ext = os.path.splitext(f)[1]  # Get the extension
    if ext not in files_by_extension:
        files_by_extension[ext] = []
    files_by_extension[ext].append(f)

print("Grouped by extension:")
print(files_by_extension)
2. Filter by date To filter by a file’s modification timestamp, use os.path.getmtime().
import os
import time

# Get files updated within the past week
folder_path = "sample_folder"
one_week_ago = time.time() - 7 * 24 * 60 * 60

recent_files = [f for f in os.listdir(folder_path) if os.path.getmtime(os.path.join(folder_path, f)) > one_week_ago]

print("Recently updated files:")
print(recent_files)

Use Cases and Convenience of Each Method

Filter ConditionUsagePurpose
File extensionf.endswith(".txt")Classify by file type
Specific substring"keyword" in fFind files related to a specific purpose
Regular expressionsre.match(pattern, f)Search for files with complex name patterns
File sizeos.path.getsize()Detect large files
Modification date/timeos.path.getmtime()Find recently used files

7. Practical Examples: How to Use the Retrieved File Name List

How you use the retrieved file name list varies widely depending on your needs and goals. In this section, we’ll introduce common use cases and several concrete code examples.

Batch Processing of File Contents

This shows an example of using the retrieved file list to read and process the contents of each file. Example: Read the contents of text files in bulk
import os

# Get text files in the folder
folder_path = "sample_folder"
text_files = [f for f in os.listdir(folder_path) if f.endswith(".txt")]

# Read all file contents in bulk
all_content = ""
for file in text_files:
    with open(os.path.join(folder_path, file), "r", encoding="utf-8") as f:
        all_content += f.read() + "
"

print("Contents of all text files:")
print(all_content)

Renaming Files

You can use the retrieved file name list to rename files in bulk. Example: Add a prefix to file names
import os

# Get files in the folder
folder_path = "sample_folder"
files = os.listdir(folder_path)

# Add a prefix to file names
for file in files:
    old_path = os.path.join(folder_path, file)
    new_path = os.path.join(folder_path, f"new_{file}")
    os.rename(old_path, new_path)

print("Renamed the files.")

Saving the File List

It’s also useful to save the retrieved file name list to an external file (a text file or Excel) so you can review it later. Example: Save the file name list to a text file
import os

# Get files in the folder
folder_path = "sample_folder"
files = os.listdir(folder_path)

# Save file names to a text file
with open("file_list.txt", "w", encoding="utf-8") as f:
    for file in files:
        f.write(file + "
")

print("Saved the file name list.")
Example: Save the file name list as CSV
import os
import csv

# Get files in the folder
folder_path = "sample_folder"
files = os.listdir(folder_path)

# Save file names to a CSV
with open("file_list.csv", "w", encoding="utf-8", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["File Name"])  # Add header
    for file in files:
        writer.writerow([file])

print("Saved the file name list to CSV.")

Creating File Backups

This is an example of using the file name list to create backups of files in a specified folder.
import os
import shutil

# Get files in the folder
source_folder = "sample_folder"
backup_folder = "backup_folder"
os.makedirs(backup_folder, exist_ok=True)

files = os.listdir(source_folder)

# Copy files to the backup folder
for file in files:
    shutil.copy(os.path.join(source_folder, file), os.path.join(backup_folder, file))

print("Backup created.")

Restricting Processing to Specific Files

Use the retrieved file list to execute processing only on files that meet specific conditions. Example: Delete files with a specific extension
import os

# Delete files in the folder with a specific extension
folder_path = "sample_folder"
files = [f for f in os.listdir(folder_path) if f.endswith(".tmp")]

for file in files:
    os.remove(os.path.join(folder_path, file))

print("Deleted unnecessary files.")

Advanced Scenarios

  • Data collection: Gather many CSV files and consolidate their data.
  • Log management: Identify and delete old log files.
  • Image processing: Resize or convert images in a specific folder.

8. Troubleshooting: Tips for Resolving Errors

When performing file operations or retrieving filenames within folders, various errors can occur. This section explains common errors and how to resolve them.

Error 1: Folder Not Found

When it occurs If the specified folder does not exist, FileNotFoundError occurs. Causes
  • The folder path is incorrect.
  • The folder has been deleted.
How to fix
  • Check whether the folder exists using os.path.exists() or pathlib.Path.exists().
Example: Code to check if a folder exists
import os

folder_path = "sample_folder"

if not os.path.exists(folder_path):
    print(f"Error: Folder not found ({folder_path})")
else:
    print("The folder exists.")

Error 2: Permission Error During File Operations

When it occurs When attempting to read or write a file, PermissionError occurs. Causes
  • No access permissions for the file or folder.
  • The file is locked by another process.
How to fix
  • Check and correct access permissions.
  • Ensure the file is not in use.
Example: Handling access permission errors
import os

file_path = "sample_folder/sample_file.txt"

try:
    with open(file_path, "r", encoding="utf-8") as f:
        content = f.read()
        print(content)
except PermissionError:
    print(f"Error: You don't have permission to access the file ({file_path})")

Error 3: File Path Too Long

When it occurs On Windows, errors can occur when a path exceeds 260 characters. How to fix
  • Enable the Windows setting that supports long paths.
  • Shorten file or folder names.
Example: Code to shorten paths
import os

# Shorten a long path
long_path = "a/very/long/path/to/a/folder/with/a/long/file_name.txt"
short_path = os.path.basename(long_path)
print(f"Shortened path: {short_path}")

Error 4: Handling Filenames with Special Characters

When it occurs Errors can occur if a filename contains special characters (for example: spaces, special symbols, or non-ASCII characters). How to fix
  • Normalize filenames.
  • Remove or replace non-ASCII characters.
Example: Code to normalize filenames
import os

file_path = "sample_folder/sample file!.txt"
normalized_file_path = file_path.replace(" ", "_").replace("!", "")
print(f"Normalized filename: {normalized_file_path}")

Error 5: Out of Memory Error

When it occurs When processing very large directories, you may run out of memory and encounter errors. How to fix
  • Process files in smaller batches instead of all at once.
  • Use generators when generating file lists.
Example: Processing using a generator
import os

def get_files(folder_path):
    for root, _, files in os.walk(folder_path):
        for file in files:
            yield os.path.join(root, file)

folder_path = "sample_folder"
for file in get_files(folder_path):
    print(f"Processing: {file}")

Error 6: File Is Locked

When it occurs A specific application may have the file open, preventing deletion or editing. How to fix
  • Identify and terminate the process using the file.
  • Wait until the file is released.

Error 7: UnicodeDecodeError

When it occurs Occurs when the file encoding is unknown. How to fix
  • Open the file with an explicit encoding.
  • Detect the encoding using the chardet library.
Example: Opening a file with a specified encoding
import os

file_path = "sample_folder/sample_file.txt"

try:
    with open(file_path, "r", encoding="utf-8") as f:
        content = f.read()
        print(content)
except UnicodeDecodeError:
    print(f"Error: Unknown file encoding ({file_path})")

Summary

By using these troubleshooting steps, you can efficiently resolve errors that occur during file operations and improve the reliability of your scripts.

9. Summary

This article explained how to get filenames inside a folder using Python, from basics to practical applications. We reviewed the features and appropriate use cases of each method and summarized which method to choose.

Features of the methods and when to use them

MethodFeaturesUse cases
os module– Part of the standard library and easy to use.Ideal for processing small folders or basic file retrieval.
– For recursive searches, use os.walk().When you need to operate on files including subdirectories.
glob module– Allows flexible pattern searches with wildcards.When you want to efficiently search by file extension or name patterns.
– Recursive searches are possible with recursive=True.Searching for files in subdirectories that meet specific criteria.
pathlib module– Enables modern, object-oriented code.When you’re using Python 3.4+ and readability is important.
– You can search files recursively and intuitively with rglob().When you want to write concise folder operations including subdirectories.

Recap of usage examples

  • Creating a file list: Use the os module and the glob module to list all files in a folder.
  • Filtering by specific conditions: Use extensions, names, or regular expressions to select only the files you need.
  • Batch processing: Use the obtained file list to streamline reading and writing of file contents.
  • Advanced operations: Renaming, creating backups, deleting unnecessary files, etc.

Importance of troubleshooting

We also touched on common errors that can occur when performing file operations. Being mindful of the following can improve script reliability:
  • Check for the existence of files and folders in advance.
  • Consider access permissions and handling of special characters.
  • Be aware of memory shortages and performance issues when handling large numbers of files.

Benefits of mastering filename retrieval in Python

Retrieving filenames in folders with Python is useful in many scenarios. In particular, it offers the following advantages.
  1. Improved efficiency: Automating routine tasks and data organization.
  2. Flexibility: Ability to operate on files under various conditions.
  3. Scalability: Can handle datasets from small to large scale.

Finally

Python offers a variety of modules and features. By appropriately choosing among the modules introduced here — os, glob, and pathlib — you can maximize efficiency and accuracy in file operations. I hope this article helps beginners and intermediate users improve their Python file-handling skills. Try applying these techniques in real projects or at work to experience how useful Python can be!
RUNTEQ(ランテック)|超実戦型エンジニア育成スクール