Tuesday, November 22, 2011


How to read in a file in C++



So here's a simple question, what is the correct way to read in a file completely in C++?

Various people have various solutions, those who use the C API, C++ API, or some variation of tricks with iterators and algorithms. Wondering which method is the fastest, I thought I might as well put the various options to the test, and the results were surprising.

First, let me propose an API that we'll be using for the function. We'll send a function a C string (char *) of a filename, and we'll get back a C++ string (std::string) of the file contents. If the file cannot be opened, we'll throw an error why that is so. Of course you're welcome to change these functions to receive and return whatever format you prefer, but this is the prototype we'll be operating on:
std::string get_file_contents(const char *filename);

Our first technique to consider is using the C API to read directly into a string. We'll open a file using fopen(), calculate the file size by seeking to the end, and then size a string appropriately. We'll read the contents into the string, and then return it.
#include <string>
#include <cstdio>
#include <cerrno>

std::string get_file_contents(const char *filename)
{
  std::FILE *fp = std::fopen(filename, "rb");
  if (fp)
  {
    std::string contents;
    std::fseek(fp, 0, SEEK_END);
    contents.resize(std::ftell(fp));
    std::rewind(fp);
    std::fread(&contents[0], 1, contents.size(), fp);
    std::fclose(fp);
    return(contents);
  }
  throw(errno);
}

I'm dubbing this technique "method C". This is more or less the technique of any proficient C++ programmer who prefers C style I/O would look like.

The next technique we'll review is basically the same idea, but using C++ streams instead.
#include <fstream>
#include <string>
#include <cerrno>

std::string get_file_contents(const char *filename)
{
  std::ifstream in(filename, std::ios::in | std::ios::binary);
  if (in)
  {
    std::string contents;
    in.seekg(0, std::ios::end);
    contents.resize(in.tellg());
    in.seekg(0, std::ios::beg);
    in.read(&contents[0], contents.size());
    in.close();
    return(contents);
  }
  throw(errno);
}

I'm dubbing this technique "method C++". Again, more or less a straight forward C++ implementation based on the same principals as before.

The next technique people consider is using istreambuf_iterator. This iterator is designed for really fast iteration out of stream buffers (files) in C++.
#include <fstream>
#include <streambuf>
#include <string>
#include <cerrno>

std::string get_file_contents(const char *filename)
{
  std::ifstream in(filename, std::ios::in | std::ios::binary);
  if (in)
  {
    return(std::string((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>()));
  }
  throw(errno);
}

This method is liked by many because of how little code is needed to implement it, and you can read a file directly into all sorts of containers, not just strings. The method was also popularized by the Effective STL book. I'm dubbing the technique "method iterator".

Now some have looked at the last technique, and felt it could be optimized further, since if the string has an idea in advance how big it needs to be, it will reallocate less. So the idea is to reserve the size of the string, then pull the data in.
#include <fstream>
#include <streambuf>
#include <string>
#include <cerrno>

std::string get_file_contents(const char *filename)
{
  std::ifstream in(filename, std::ios::in | std::ios::binary);
  if (in)
  {
    std::string contents;
    in.seekg(0, std::ios::end);
    contents.reserve(in.tellg());
    in.seekg(0, std::ios::beg);
    contents.assign((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>());
    in.close();
    return(contents);
  }
  throw(errno);
}

I will call this technique "method assign", since it uses the string's assign function.

Some have questioned the previous function, as assign() in some implementations may very well replace the internal buffer, and therefore not benefit from reserving. Better to call push_back() instead, which will keep the existing buffer if no reallocation is needed.
#include <fstream>
#include <streambuf>
#include <string>
#include <algorithm>
#include <iterator>
#include <cerrno>

std::string get_file_contents(const char *filename)
{
  std::ifstream in(filename, std::ios::in | std::ios::binary);
  if (in)
  {
    std::string contents;
    in.seekg(0, std::ios::end);
    contents.reserve(in.tellg());
    in.seekg(0, std::ios::beg);
    std::copy((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>(), std::back_inserter(contents));
    in.close();
    return(contents);
  }
  throw(errno);
}

Combining std::copy() and std::back_inserter(), we can achieve our goal. I'm labeling this technique "method copy".

Lastly, some want to try another approach entirely. C++ streams have some very fast copying to another stream via operator<< on their internal buffers. Therefore, we can copy directly into a string stream, and then return the string that string stream uses.
#include <fstream>
#include <sstream>
#include <string>
#include <cerrno>

std::string get_file_contents(const char *filename)
{
  std::ifstream in(filename, std::ios::in | std::ios::binary);
  if (in)
  {
    std::ostringstream contents;
    contents << in.rdbuf();
    in.close();
    return(contents.str());
  }
  throw(errno);
}

We'll call this technique "method rdbuf".


Now which is the fastest method to use if all you actually want to do is read the file into a string and return it? The exact speeds in relation to each other may vary from one implementation to another, but the overall margins between the various techniques should be similar.

I conducted my tests with libstdc++ and GCC 4.6, what you see may vary from this.

I tested with multiple megabyte files, reading in one after another, and repeated the tests a dozen times and averaged the results.


MethodDuration
C24.5
C++24.5
Iterator64.5
Assign68
Copy62.5
Rdbuf32.5


Ordered by speed:


MethodDuration
C/C++24.5
Rdbuf32.5
Copy62.5
Iterator64.5
Assign68


These results are rather interesting. There was no speed difference at all whether using the C or C++ API for reading a file. This should be obvious to us all, but yet many people strangely think that the C API has less overhead. The straight forward vanilla methods were also faster than anything involving iterators.

C++ stream to stream copying is really fast. It probably only took a bit longer than the vanilla method due to some reallocations needed. If you're doing disk file to disk file though, you probably want to consider this option, and go directly from in stream to out stream.

Using the istreambuf_iterator methods while popular and concise are actually rather slow. Sure they're faster than istream_iterators (with skipping turned off), but they can't compete with more direct methods.

A C++ string's internal assign() function, at least in libstdc++, seems to throw away the existing buffer (at the time of this writing), so reserving then assigning is rather useless. On the other hand, reading directly into a string, or a different container for that matter, isn't necessarily your most optimal solution where iterators are concerned. Using the external std::copy() function, along with back inserting after reservation is faster than straight up initialization. You might want to consider this method for inserting into some other containers. In fact, I found that std::copy() of istreambuf_iterators with back inserter into an std::deque to be faster than straight up initialization (81 vs 88.5), despite a Deque not being able to reserve room in advance (nor does such make sense with a Deque).

I also found this to be a cute way to get a file into a container backwards, despite a Deque being rather useless for working with file contents.
std::deque<char> contents;
std::copy((std::istreambuf_iterator<char>(in)), std::istreambuf_iterator<char>(), std::front_inserter(contents));

Now go out there and speed up your applications!

If there's any demand, I'll see about performing these tests with other C++ implementations.

80 comments:

insane coder said...

I tested with clang LLVM 3.0 and got the following:
C/C++: 7.5
rdbuf: 31.5
copy: 97
assign: 102
iter: 110

The iterator tests also varied in their running times between 90 and 120, whereas the others, or with GCC, the variation between tests were a difference of 1-2.

As with GCC, LLVM is smart enough to see that the reserve() followed by assign() makes the former a null op and optimizes it out.

insane coder said...

Okay, I tested with Visual C++ 2005.

C: 18.3
C++: 21
rdbuf: 199
iter: 209.3
assign: 221
copy: 483.5

Something tells me that either their STL isn't designed too well, or the compiler really isn't smart enough to optimize all that template code properly.

Results here were also very erratic. Times between tests varied anywhere from 2 to 40.

I'll try to test 2010 later.

Freddie Witherden said...

Interesting; I am surprised by the poor performance of the 'heavier' solutions although the naff VS performance /may/ be due to checked iterators. (It has been a long time since I've used VS but IIRC it used to enable checked iterators by default.)

Might be worth comparing the performance to a mmap + memcpy type solution (I think boost provides a memory mapped file wrapper somewhere). Or to see how EKOPath performs which uses the Apache STL.

Regards, Freddie.

insane coder said...

Hi Freddie,

If I really wanted the fastest possible way to read in a file, I would be using POSIX 2008.

First I'd open the file, get its size, then tell the OS to optimize for a sequential read. Then I'd create a buffer which is in sync with the blocking factor of the partition.

Doing so, I'd easily cut the best time here in half. But I'm focusing here on standard C++.

insane coder said...

Visual C++ 2010:
C: 16.5
C++: 20.4
rdbuf: 176.2
assign: 222.8
iter: 224.4
copy: 320

Results with this were also less erratic. Variance between passes were now 2-10.

Eitan said...

Were these timings done with a warm cache or cold cache? Was cron disabled? Was it done with the network card plugged in? What about the filesystems mounted read only?

There are still too many variables to take these numbers as usable.

insane coder said...

Hi Eitan,

Tests were done from a RAM drive on an isolated machine with no cron. The tests were also run in random orders. But that's besides the point. The numbers are not to indicate exact speeds of anything. The numbers are to indicate which method averages better performance relative to others.

Eitan said...

Averages better is meaningless statistically. The question to ask is "Are these numbers significantly different at 95% confidence?". In order to get that answer you need more data than just the average (ie the std deviation)

insane coder said...

I already mentioned the standard deviation and variance.

Eitan said...

Understood. I was just commenting on "The numbers are to indicate which method averages better performance relative to others."
In general your methodology is better than most people's that I've seen :)

insane coder said...

A follow up has been posted.

Anonymous said...

Thanks for this wonderful blog! :)
Can you also provide us an example on how to quickly write a huge file? Oh and how to quickly parse through a huge file.

insane coder said...

Parsing is outside the scope of the article, and when writing, size matters not.

When you want to write an array or a continuous container, you just write its contents, no trickery will improve that situation (except for informing the OS for a sequential write in POSIX 2008).

If your data is non continuous, then you'll need some kind of loop to write it all.

Unknown said...
This comment has been removed by the author.
Unknown said...

Hello, thank you for this article. I was leaning toward the assign method, going with the c++ method instead now. Have you tested your results with gcc 4.8? Would love to see a followup where you do the same tests with todays implementations.

John Calcote said...

I'm interested in your thoughts on http://stackoverflow.com/questions/1042940/writing-directly-to-stdstring-internal-buffers/1043318#1043318.

In summary: This article discusses the incorrect assumption by some programmers that you can write to the pointer returned by c_str() AND it also discusses how one should not treat the address of a character reference (string[]) as a location that can be written to. It provides the reasons why - those being that the internal format of the string may not be contiguous memory, even though c_str() returns a pointer to contiguous memory.

However it doesn't discuss using &str[0] as a writable buffer pointer when you've resized an empty string. In other words, it would be difficult to conceive of an implementation that would resize a null string to a set of non-contiguous buffers. I presume this is why you feel it's safe to use str.resize() followed by a write to &str[0].

insane coder said...

My thoughts are that more people need to read Effective C++.

&str[0] is safe, and mentioned in C++ 2003+.

Writing to a c_str() is a very bad idea. It returns a pointer to a const for a reason.

I did not base my decision on feelings of safety, but rather real documentation. I highly advise reading the standard material, and The C++ Programming Language, and Effective C++.

Unknown said...

Interesting. In a similar problem, I am trying to read the content of a text file continuously in VS C++ and the content of that text file is changing by python platform at random time interval. While I am doing so VS C++ is not reading the text file except for the first time. Can any one please tell me how to do it.

James Hawk III said...

Wouldn't it have been more efficient to pass the target string via the call to avoid making a copy of the string during the return? At the time you wrote this there weren't any move semantics available in C++, and even now I'm not sure move semantics would kick in.

Or is there something about std::string assignment operators that makes copy operations not produce intermediate duplicates?

insane coder said...

Modern C++ compilers optimize out duplicate creation from return.

http://en.wikipedia.org/wiki/Return_value_optimization

Unknown said...

My problem solver.
i was trying to read a full file neglecting eof. and this blog solved my this issue in no time. Hey insane blogger can you explain me or give reference of exact operations you are performing in in your last method c++.

insane coder said...

Pretty much everything here is explained.

Is there a point in particular that you feel needs more clarification?

Unknown said...

Great Blog.....Thanks

Unknown said...
This comment has been removed by the author.
hitesh kumar said...

Find Length of String in C++

Thanks for this article

Unknown said...

Everything seemed as I would predict, no surprises.

Though did not include the fastest and simplest c standard way which would have been using handles (open rather than fopen) to bypass the tiny overhead of the file buffer.

insane coder said...

Hello David Jay,

I did include it in my larger follow up article as POSIX: http://insanecoding.blogspot.com/2011/11/reading-in-entire-file-at-once-in-c.html

insane coder said...

Hello Hitesh,

I'm not sure what the point in your link was, but whoever wrote that code should be drawn and quartered.

A) Because they're using gets(). See: http://insanecoding.blogspot.com/2007/03/what-to-do-about-gets.html
B) Because they're using an int to store the return value from strlen(). See: http://insanecoding.blogspot.co.il/2010/03/does-anyone-understand-types-and.html
C) Because they're using the non-standard iostream.h instead of iostream.

But mostly for A.

Glory said...
This comment has been removed by the author.
Glory said...

Your article clears my all doubts. Nice article. Thanks for sharing such useful information.


Hitesh Kumar said...

find length of string in c++
Really nice your article thanks for sharing this article

Santosh said...

A nice article. Thanks for sharing.

TerA said...

Thank you so much for sharing such a wonderful article.
aunt xxx

aarjav.asinfo said...

You are doing really wonderful job .And one more thing can you tell me basic things about python language. Ukraine Education

aarjav.asinfo said...

HRI, who offer technical advice and supply of spray nozzles, CIP tank cleaning equipment, wash down guns, hose reels and hose, filter nozzles, fog nozzles, dust control cannons and misting systems for dust suppression or evaporative cooling as well as a range of disinfection equipment.
For more details visit this website: HRI Engineering Company

aarjav.asinfo said...

We are manufacturers and suppliers poly bags , pp bags , cello tapes ,brown tapes ,air bubble rolls, stretch wrapping film, hm bags, ld bags,printed bags , plain poly bags, coloured polythene bags, poly tube rolls,polythene rolls, poly tube rolls, poly sheet rolls, polythene bags , in gurgaon. We dealing in these products for a long time .If anyone is interested in these products , contact us at our website
For more details visit this website: Shree Bankey Bihariji Packaging

aarjav.asinfo said...

Education Abroad is a leading consultancy Which help Indian students in mbbs admission in abroad.It helps Indian students in taking admission in various countries such as America , Ukraine, and others. It is provide every help in taking mbbs admission in abroad in top medical university in abroad.It provide every help in medical study in abroad . if anyone is planning in studying abroad feel free to contact us. Education Abroad

aarjav.asinfo said...

Rocks Player Ultra HD Video Player is Simple and Fast video player with which you can play HD & ultra HD videos of all formats. Rocks Player offers unique set of features like Gesture control for Volume, Brightness, Playback speed and Forward, Assistance of subtitles of videos, Auto sensor etc. Rocks Player give you enjoy smoother, better quality videos & movies.
Rocks Player

Aleric said...

first time tight work

MBBS in Philippines said...

Wisdom Overseasis authorized India's Exclusive Partner of Southwestern University PHINMA, the Philippines established its strong trust in the minds of all the Indian medical aspirants and their parents. Under the excellent leadership of the founder Director Mr. Thummala Ravikanth, Wisdom meritoriously won the hearts of thousands of future doctors and was praised as the “Top Medical Career Growth Specialists" among Overseas Medical Education Consultants in India.

Southwestern University PHINMAglobally recognized university in Cebu City, the Philippines facilitating educational service from 1946. With the sole aim of serving the world by providing an accessible, affordable, and high-quality education to all the local and foreign students. SWU PHINMA is undergoing continuous changes and shaping itself as the best leader with major improvements in academics, technology, and infrastructure also in improving the quality of student life.

Metaeducationindia said...

MBBS Admission in Ukraine is considered one of the renowned and popular destinations, for giving excellent education to students at very affordable prices. It is one of the hottest destinations among top ranked countries for higher education in the world. Ukraine is world famous for offering the lowest MBBS courses across the nation. There are so many central or state governed Medical universities in Ukraine.ore filling the NEET 2022 application form candidates must check the list of documents and details to keep handy.

Metaeducationindia said...

The course duration for MBBS Admission in Kazakhstan is 5 years. After the completion of the 5 years, you will get your MBBS degree. After this, you can come back to India and sit for the screening test which will be NEXT.
On clearing the screening test, you will have to do 1-year of internship in India.
After that, you will get a license to practice as a doctor in India. You can also appear for exams such as USMLE and PLAB if the university you selected is WHO recognized.

mobeligadzee said...

Digital marketing is a field that focuses on online product promotion and marketing. Taking an online digital marketing course can help you lay the groundwork for this knowledge while also helping you advance your profession. . Digital marketing course in chennai

Zea Player said...

Video Player Appfor android tablet Support all devices, watch videos on both android tablets and android phones. 🎉 HD video player 2021 is your best online video mate and online music player in India to enjoy the latest Bollywood movies 🎥, Hindi films 📽, watch Newly Download Movies 2021 and other local video clips 📼 in all format (MKV videos, FLV videos,M4V videos, etc.

Easy Loan Mart said...

Hi...
To read from a file, use either the ifstream or fstream class, and the name of the file. Note that we also use a while loop together with the getline() function ...
You are also read more Apply Free Business

TerA said...

friends gave This is incredibly useful information!! Excellent work.

Faguss said...

in.read(&contents[0], contents.size());
String size() returns an unsigned value while read() requires a signed value.

Aakash Shahakar said...

Thank you for this amazing blog.
Imperial Money is a dedicated company that provides personalized services for wealth creation. It is an all-around choice to go for to induce your monetary assets at ease with multiple innovative prospects that add more value to your profile. The services and ideas include innovative products, best-in-class experience, mutual funds, and equities.Imperial Money is a mutual fund company in India that helps you to create wealth from your income. Imperial Money provides services like a SIP calculator yearly, education calculator, and many more.
Step Up calculator
Mutual fund investment app

Unknown said...

Now a days, MOSTLY MEN WORRY ABOUT their p-size and s-performance in bedroom. Although a frustrating problem that affects your confidence and self-esteem, it’s manageable with a proactive approach. Just subscribe with lustrous.fun and receive the BEST HEALTH SUPPLEMENTS which can increase your over all performance.

srashti00 said...

Thank you for share this blog! I just read it and really I got very much information. I'll share this with my friends.
Join the best If you're looking for an internship, you should do the Ethical Hacking Training in Noida

Anonymous said...

IoT Training in Delhi
PLC SCADA Training in Delhi
Solidworks Training in Noida
AWS Training in Delhi
Online AutoCAD Training
Online Summer Training

Shreya Singh said...
This comment has been removed by the author.
Shreya Singh said...

Really informative post and perfect for new comers in the industry.
Join now Digital Marketing Training in Delhi in Online Summer Training.
Data Analytics Training In Noida
SAP Online Training
SAP FICO Training in Delhi
MERN Stack Training in Noida
Revit Architecture Training In Delhi
ANSYS Online Training
Django Online Training
Machine Learning Online Training

ravi said...

It lets you develop custom applications without requiring complex code or expensive resources. MuleMasters offers the best Appian training in Hyderabad with highly equipped training. https://mulemasters.in/appian-training-in-hyderabad/

Rahul said...

Thanks for this informative content. You provide so much knowledge keep doing it. By the way, I am a course provider and I provide filmmaking courses like acting, modelling, direction, cinematography, VFX, photography and much more courses related to a filmmaking career. We offer online or offline classes if someone wants to learn acting online then he or she can take our Online acting course to get certified.

Unknown said...

Nice piece of information
Ayush Cricket Academy
is an idol Platform for all the young head-strong players who have serious ambition when it comes to their sports career.

Unknown said...

SParkling CHess ACademy is one of the best chess institutes in Gurugram, Ghaziabad, Noida, Faridabad & Delhi. We have students coming from various backgrounds and levels of experience in the game all we do is to train them with the right approach so that they stand out of the crowd.

Divya said...

thank you for the blog. keep sharing more.
Artificial Intelligence Course in Chennai
Best AI Courses Online
Artificial Intelligence Course In Bangalore

guidacentconsulting said...

PG Medical AdmissionThe aim of Guidance Consulting Services is to help students to take their careers in medical, MBBS to greater heights & achievements. We have members of our team who have studied medicine both in India and other countries.

Divya said...

great blog. keep sharing more.
Artificial Intelligence Training in Chennai
AI Training In Bangalore
Artificial Intelligence Training in Coimbatore

Zea Soft said...

Realy Great Aartical, Please Upload Daily Posts. These Are My Aps Please Check Out. Video Player App
😍 " Ad-Free Video Player"

Marble Shooter Ball Blast Game

React geeks said...

Realy Great Article

React JS Training in Hyderabad!

React geeks said...

Really Great Article

React JS Training in Hyderabad!

React geeks said...

Really Great Article
PUNJABI SONG LYRICS!

Zea Player said...

FlvPlayer give you enjoy smoother, better quality videos & movies.
FLV Player For FLV Video Files

Muskan said...

I like your blog. This blog is really useful for developers.
Python Certification Course in Lucknow

ONLEI Technologies said...

NiceBlog
Python Training in Noida
Machine Learning Training in Noida
Data Science Training in Noida
Digital Marketing Training in Noida

Zoya khan said...
This comment has been removed by the author.
Zoya khan said...

Thank you for sharing this enlightening content. It's truly appreciated and highly informative!

Learn more about our Sailpoint Training to get the best knowledge.

Mnj koch said...

Learn How to Control Your Mind

Mnj koch said...

Explore Hindi Bhakti Sangeet Lyrics

Shubham raj thakur said...

hi , i am shubham a website designer and developer and if you wanna learn web designingn then can join a top Css web design course in delhi with placament.

Anonymous said...

IFSTransit est un prestataire de services de transit basé à Roissy, spécialisé dans la gestion et la coordination des flux de marchandises. Grâce à notre position stratégique à Roissy, l'un des principaux hubs de fret internationaux, nous sommes idéalement placés pour optimiser les itinéraires et réduire les délais de transit.

Transitaire Roissy

Nandi IVF said...

Searching for the leading Best IVF Centre in Rohini? Your search ends here! Our center stands out for its excellence in IVF services and is widely regarded as the best in the region. Our experienced doctors and advanced facilities ensure that you receive the utmost care and support throughout your journey.

Whether you're facing fertility challenges or exploring assisted reproductive options, we are committed to assisting you in realizing your goal of parenthood.

ONLEI Technologies Job Oriented Course said...

Very Nice Blog . Thanks for Posting
I found this post really interesting. This information is beneficial for those who are looking for
Best Data Science Course Training in Bangalore
Best Data Science Course Training in Hyderabad
Best Data Science Course Training in Pune
Data Science Course Training Certification in USA
Best Data Science Certification Course Online

priyankarajput said...

Thank you for sharing this insightful content. I always appreciate well-crafted articles that provide valuable information. The ideas presented are not only excellent but also quite intriguing, making the post thoroughly enjoyable. Keep up the fantastic work, and I look forward to more from you.
visit: Data Cleaning and Preprocessing: Ensuring Data Quality

The Pie Matrix said...

Looking to Buy Binoculars Online? Look no further! Our online store offers a wide selection of binoculars to suit your needs. Whether you're an avid birdwatcher, a nature enthusiast, or simply in need of a reliable pair of binoculars for your next outdoor adventure, we've got you covered.

Research thoroughly, make an informed decision, and enjoy the convenience of purchasing binoculars online for your outdoor adventures or hobby.

sclinbio said...

Experience personalized energy healing at Healing Buddha, where individual needs are assessed to create customized treatment plans.
our sclinbio.com

Ranjith said...

Very good article.Thanks for sharing.
https://www.visualpath.in/site-reliability-engineering-sre-online-training-hyderabad.html

Afreen Ansari said...

Very Nice Article thanks for sharing Sir
Digital Edge Institute offers the best digital marketing training in Noida, providing comprehensive courses tailored to industry demands. Expert-led instruction and hands-on experience equips students with the skills needed for success in the dynamic digital landscape.
https://www.digitaledgeinstitute.com/