Read CSV File in Python
Learn via video course

Overview
When trying to read the CSV file in python, we come across a different method to do the same. With the below article, we shall be exploring the different methods to read CSV files in python that can help us dive into the multiple formats to read CSV file in python with the help of detailed examples along with its explanation.
Scope
The article covers below topics :
- CSV File and its usage.
- Various ways to parse and read CSV files in python along with detailed examples and code.
- Integration of python loops with the CSV module to read CSV files in python.
The article shall not be covering the basics of how to use the CSV module function in python to read and write CSV files.
What Is a CSV File?
Before we can start exploring the different methods by which we can read CSV files in python, let us first understand what exactly is a CSV file?
A CSV file can be defined as a simple file format that have the values sequenced by comma-separation and is then stored as a plain text unlike a tabular format like in a spreadsheet or database. In a CSV file, we see each line as a data record in that file. The records consist of one or more fields that are comma-separated. In simple terms, the CSV file got its name as the file contains the values that are comma separated, hence the CSV file. In python, we have an inbuilt module called CSV that helps while working with CSV files.
The abbreviation CSV stands for comma-separated variable, where the comma is what is known as a delimiter. While you can also simply use Python's split() function, to separate lines and data within each line, the CSV module also comes handy to make things easy.
Below diagram shows how the CSV file actually looks like:
Basic Use of csv.reader()
Now let's start exploring the different ways in which we can read CSV files in python. To begin with, let us take a deep dive into understanding how we can make use of a CSV reader to read the CSV file and parse it in python. Without wasting much time, let us understand the CSV.reader() in python to revisit and refresh the existing knowledge.
Read CSV Files with csv.reader()
Lets read the CSV file that is, Innovators_Of_Lang.csv shown below with csv.reader()
Code:
Output:
Explanation:
In the above example of understanding the concept of reading the CSV file in python, we first start by opening the CSV file through the open() function in python and we know that it opens the Innovators_Of_Lang.csv. Once done, we start reading the file using the CSV.reader() that is returning the iterable reader object.
To iterate this reader object, we use the for-loop to print the contents of each row in the file. This way, we get the output as shown above, where we see each value is easily visible.
Read CSV Files with Custom Delimiters
Usually, we take a comma as a delimiter for a CSV file. But, its not restricted there. We can make use of space, pipe or \t for same. To read a file, let us say the Innovators_Of_Lang.csv as shown in above example, where we had the ',' as the delimiter. Now we can also pass an additional delimiter parameter to the csv.reader() function as studied above.
Let's dive in below example with the csv file Innovators_Of_Lang.csv as shown below to understand better.
Code:
Output:
Explanation:
In the above example where we are understanding the concept of reading the csv file in python through custom delimiters, we see that we have passed the parameter delimiter = \t. This optional delimiter helps us to determine the reader object from the CSV file that we are reading the input from along with having the \t as a delimiter.
Read CSV Files with Initial Spaces
We shall see that there are some CSV file that have a space character right after a delimiter. Now, if we try to read this csv via the above learnt method that is, through csv.reader() function. If we did so, then we shall be seeing spaces in the output as well. Now to solve this space issue coming up in output, we make use of an additional parameter called the skipinitialspace. We pass this to overcome the issue.
Lets read the below CSV file named Customer.csv given below having initial spaces
Code:
Output:
Explanation:
As understood from code whenever we have a initial spaces in csv file, then we can pass an additional parameter called the skipinitialspace as TRUE to resolve the issue. By keeping the skipinitialspace as TRUE, we allow the reader object to identify the initial whitespaces before hand only. This allows the spaces to be removed that are present after a delimiter.
Read CSV Files with Quotes
We might sometimes also find that the csv file in python might contain Quotes to seperate the sentences or words. Now in order to remove it, we make use of the optional paramter called the quoting. If we try using the csv.reader() to read the csv file in python we will be getting the quotation marks in output.
Let's utilise the below FamousQuotes.csv to understand the concept of read csv files in python with quotes:
Code:
Output:
Explanation: Here, we are using the optional parameter called the quoting to avoid the quotation marks to come up in the output. As can be seen from above code, we have utilised the csv.QUOTE_ALL to the quoting parameter which is a constant described inthe csv module. This csv.QUOTE_ALL signify the reader object all those value of the CSV file that are present inside the quotation marks.
To understand the concept of predefined constant that are available in the csv module that helps us to mitigate the quotation mark in the output are as follows:
- csv.QUOTE_MINIMAL: This describes that the entries present in the csv file have quotes around only the values that contain special characters like delimiter, quotechar or any of the characters in lineterminator.
- csv.QUOTE_NONNUMERIC: This describes that the non- numeric entries present in the csv file have the quotes around them.
- csv.QUOTE_NONE: This describes that the entries in the csv file do not have any quotes around them.
Read CSV Files Using Dialect
While we are using the optional parameters in above example like quoting and skipinitialspace to the csv.reader() function in python, this might be helpful only when we are dealing with one or two files. When used with multiple CSV files with similar formats, then we shall see that our code has become more redundant and dirty.
To solve this issue of redundancy, we have the dialect as an optional parameter as part of the csv module.
The Dialect can help us to group together so many optional formatting patterns such as delimiter, skipinitialspace, quoting, escapechar into a single dialect name after which we can pass the same single dialect as a parameter to multiple writer or reader instances.
Let us take the below Email.csv to understand this concept better:
In the above CSV file we see initial spaces, quotes surrounding each entry along with the use of | delimiter. Now in order to solve this issue via dialects to read the csv file in python let us dive in the below code and see how we can avoid the passing three individual formatting patterns.
Code:
Output:
Explanation: As can be seen in the above example, we have used the csv.register_dialect() function in order to define the custom dialect. Th syntax for same is as below:
While making the use of the dialect, it helps to make the program more modular. We need to specify the name in the form of a string value for that particular custom dialect and this can be done either by passing a sub-class of Dialect class, or by individual formatting patterns.
As seen, we pass dialect=myDialect to specify that the reader instance must use that particular dialect while we are creating that reader object. Once a dialect like in this case, dialect=myDialect was created, we can reuse the same to open other files without having to re-specify the CSV format.
Read CSV Files with csv.DictReader()
In order to read a CSV file as dictionary that is, key-value format we can make use of the objects of a csv.DictReader() class.
Let us consider below CSV file 'profession.csv' to understand the concept of csv.DictReader() in detail:
Code:
Output:
Explanation:
From above, we can see that the entries in the first row are dictionary keys along with the dictionary values being the entries in the other rows. In the code, we first start by creating dictionaries inside the for loop by explicitly using the dict() method that is, csv.DictReader() object. After the csv file was passing the csv.DictReader() object, we made the use of for loop to print each record of the csv file as a key-value(dictionary) pair.
The syntax of the csv.DictReader() class is:
Reading CSV Files With pandas
Yes, you heard it right! We can leverage the use of pandas library to read the csv file effortlessly. Let us use the below csv file to understand the use of pandas library.
employee.csv
Code:
Output:
Explanation: Here we made use of the pandas library to read the csv file. We can see that the csv file contains the date in the form of a string and we can leverage the opandas library to convert the string to date format with the help of the parse_dates optional parameter which basically defines the column to be treated as dates. We can also see that in the csv file we had different column names and in the output we have different we can implement the same by using the index_col optional parameter as shown in code.
Using csv.Sniffer Class
Now we shall be diving into the concept of Sniffer class that is used to deduce the format of a CSV file.
There are majorily two methods to offer the Sniffer class as follows:
- sniff(sample, delimiters=None): This function helps to analyze the specified sample of the CSV text and in turn returns the dialect subclass which has all the parameters deduced. We can also pass an optional delimiters parameter in the form of a string that contains the possible valid delimiter characters.
- has_header(sample): On the other hand, This function analyzes the CSV file whether it has the first row as column headers or not, returns True or False based on its analysis.
Let us take the below office_abc.csv to understand this concept better:
Code:
Output:
Explanation:
From the above code, we start by using the read function on 64 characters of the office_abc.csv Then we use the has_header() function to check if the csv file has the header or not and print the value. We then move to using teh sniff() function to analyse the csv file text and return the dialect subclass which has all the parameters deduced. Once both function are analysed and we have the output, we make the use of simple csv reader function and pass the csv file and deduced_dialect in it. Once the csv.reader() function gets executed we get the output as obtained in a list.
Conclusion
- A CSV file can be described as a simple file format having the values that are comma separated.
- We have three methods to mitigate the quotation mark in the output as follows:
- CSV.QUOTE_MINIMAL: Helps to mitigate the quotes around the values that contain special characters.
- CSV.QUOTE_NONNUMERIC: Helps mitigate the quotes around the non-numeric entries present in the CSV file.
- CSV.QUOTE_NONE: Helps mitigate the entries in the CSV file that do not have any quotes around them.
- There is two major categories of Sniffer class:
- sniff(sample, delimiters=None): Analyze the specified samples of the CSV text. It returns the dialect subclass that contains all the parameters deduced.
- has_header(sample): Analyze the CSV file whether it has the first row as column headers or not, and then returns True or False based on its analysis.