## Saturday, November 3, 2007

### PATH_MAX simply isn't

Many C/C++ programmers at some point may run into a limit known as PATH_MAX. Basically, if you have to keep track of paths to files/directories, how big does your buffer have to be?
Most Operating Systems/File Systems I've seen, limit a filename or any particular path component to 255 bytes or so. But a full path is a different matter.

Many programmers will immediately tell you that if your buffer is PATH_MAX, or PATH_MAX+1 bytes, it's long enough. A good C++ programmer of course would use C++ strings (std::string or similar with a particular API) to avoid any buffer length issues. But even when having dynamic strings in your program taking care of the nitty gritty issue of how long your buffers need to be, they only solve half the problem.

Even a C++ programmer may at some point want to call the getcwd() or realpath() (fullpath() on Windows) functions, which take a pointer to a writable buffer, and not a C++ string, and according to the standard, they don't do their own allocation. Even ones that do their own allocation very often just allocate PATH_MAX bytes.

getcwd() is a function to return what the current working directory is. realpath() can take a relative or absolute path to any filename, containing .. or levels of /././. or extra slashes, and symlinks and the like, and return a full absolute path without any extra garbage. These functions have a flaw though.

The flaw is that PATH_MAX simply isn't. Each system can define PATH_MAX to whatever size it likes. On my Linux system, I see it's 4096, on my OpenBSD system, I see it's 1024, on Windows, it's 260.

Now performing a test on my Linux system, I noticed that it limits a path component to 255 characters on ext3, but it doesn't stop me from making as many nested ones as I like. I successfully created a path 6000 characters long. Linux does absolutely nothing to stop me from creating such a large path, nor from mounting one large path on another. Running getcwd() in such a large path, even with a huge buffer, fails, since it doesn't work with anything past PATH_MAX.

Even a commercial OS like Mac OS X defines it as 1024, but tests show you can create a path several thousand characters long. Interestingly enough, OSX's getcwd() will properly identify a path which is larger than its PATH_MAX if you pass it a large enough buffer with enough room to hold all the data. This is possible, because the prototype for getcwd() is:
char *getcwd(char *buf, size_t size);

So a smart getcwd() can work if there's enough room. But unfortunately, there is no way to determine how much space you actually need, so you can't allocate it in advance. You'd have to keep allocating larger and larger buffers hoping one of them will finally work, which is quite retarded.

Since a path can be longer than PATH_MAX, the define is useless, writing code based off of it is wrong, and the functions that require it are broken.

An exception to this is Windows. It doesn't allow any paths to be created larger than 260 characters. If the path was created on a partition from a different OS, Windows won't allow anything to access it. It sounds strange that such a small limit was chosen, considering that FAT has no such limit imposed, and NTFS allows paths to be 32768 characters long. I can easily imagine someone with a sizable audio collection having a 300+ character path like so:
"C:\Documents and Settings\Jonathan Ezekiel Cornflour\My Documents\My Music\My Personal Rips\2007\Technological\Operating System Symphony Orchestra\The GNOME Musical Men\I Married Her For Her File System\You Don't Appreciate Marriage Until You've Noticed Tax Pro's Wizard For Married Couples.Track 01.MP5"

Before we forget, here's the prototype for realpath:
char *realpath(const char *file_name, char *resolved_name);

Now looking at that prototype, you should immediately say to yourself, but where's the size value for resolved_name? We don't want a buffer overflow! Which is why OSs will implement it based on the PATH_MAX define.
The resolved_name argument must refer to a buffer capable of storing at least PATH_MAX characters.

Which basically means, it can never work on a large path, and no clever OS can implement around it, unless it actually checks how much RAM is allocated on that pointer using an OS specific method - if available.

For these reasons, I've decided to implement getcwd() and realpath() myself. We'll discuss the exact specifics of realpath() next time, for now however, we will focus on how one can make their own getcwd().

The idea is to walk up the tree from the working directory, till we reach the root, along the way noting which path component we just went across.
Every modern OS has a stat() function which can take a path component and return information about it, such as when it was created, which device it is located on, and the like. All these OSs except for Windows return the fields st_dev and st_ino which together can uniquely identify any file or directory. If those two fields match the data retrieved in some other way on the same system, you can be sure they're the same file/directory.
To start, we'd determine the unique ID for . and /, once we have those, we can construct our loop. At each step, when the current doesn't equal the root, we can change directory to .., then scan the directory (using opendir()+readdir()+closedir()) for a component with the same ID. Once a matching ID is found, we can denote that as the correct name for the current level, and move up one.

Code demonstrating this in C++ is as follows:

  bool getcwd(std::string& path)  {    typedef std::pair<dev_t, ino_t> file_id;        bool success = false;    int start_fd = open(".", O_RDONLY); //Keep track of start directory, so can jump back to it later    if (start_fd != -1)    {      struct stat sb;      if (!fstat(start_fd, &sb))      {        file_id current_id(sb.st_dev, sb.st_ino);        if (!stat("/", &sb)) //Get info for root directory, so we can determine when we hit it        {          std::vector<std::string> path_components;          file_id root_id(sb.st_dev, sb.st_ino);          while (current_id != root_id) //If they're equal, we've obtained enough info to build the path          {            bool pushed = false;            if (!chdir("..")) //Keep recursing towards root each iteration            {              DIR *dir = opendir(".");              if (dir)              {                dirent *entry;                while ((entry = readdir(dir))) //We loop through each entry trying to find where we came from                {                  if ((strcmp(entry->d_name, ".") && strcmp(entry->d_name, "..") && !lstat(entry->d_name, &sb)))                  {                    file_id child_id(sb.st_dev, sb.st_ino);                    if (child_id == current_id) //We found where we came from, add its name to the list                    {                      path_components.push_back(entry->d_name);                      pushed = true;                      break;                    }                  }                }                closedir(dir);                if (pushed && !stat(".", &sb)) //If we have a reason to contiue, we update the current dir id                {                  current_id = file_id(sb.st_dev, sb.st_ino);                }              }//Else, Uh oh, can't read information at this level            }            if (!pushed) { break; } //If we didn't obtain any info this pass, no reason to continue          }          if (current_id == root_id) //Unless they're equal, we failed above          {            //Built the path, will always end with a slash            path = "/";            for (std::vector<std::string>::reverse_iterator i = path_components.rbegin(); i != path_components.rend(); ++i)            {              path += *i+"/";            }            success = true;          }          fchdir(start_fd);        }      }      close(start_fd);    }    return(success);  }

Before we accept that as the defacto method to use in your application, let us discuss the flaws.

As mentioned above, it doesn't work on Windows, but a simple #ifdef for Windows can just make it a wrapper around the built in getcwd() with a local buffer of size PATH_MAX, which is fine for Windows, and pretty much no other OS.

This function uses the name getcwd() which can conflict with the built in C based one which is a problem for certain compilers. The fix is to rename it, or put it in its own namespace.

Next, the built in getcwd() implementations I checked only have a trailing slash on the root directory. I personally like having the slash appended, since I'm usually concatenating a filename onto it, but note that if you're not using it for concatenation, but to pass to functions like access(), stat(), opendir(), chdir(), and the like, an OS may not like doing the call with a trailing slash. I've only noticed that being an issue with DJGPP and a few functions. So if it matters to you, the loop near the end of the function can easily be modified to not have the trailing slash, except in the case that the root directory is the entire path.

This function also changes the directory in the process, so it's not thread safe. But then again, many built in implementations aren't thread safe either. If you use threads, calculate all the paths you need prior to creating the threads. Which is probably a good idea, and keep using path names based off of your absolute directories in your program, instead of changing directories during the main execution elsewhere in the program. Otherwise, you'll have to use a mutex around the call, which is also a valid option.

There could also be the issue that some level of the path isn't readable. Which can happen on UNIX, where to enter a directory, one only needs execute permission, and not read permission. I'm not sure what one can do in that case, except maybe fall back on the built in one hoping it does some magical Kernel call to get around it. If anyone has any advice on this one, please post about it in the comments.

Lastly, this function is written in C++, which is annoying for C users. The std::vector can be replaced with a linked list keeping track of the components, and at the end, allocate the buffer size needed, and return the allocated buffer. This requires the user to free the buffer on the outside, but there really isn't any other safe way of doing this.
Alternatively, instead of a linked list, a buffer which is constantly reallocated can be used while building the path, constantly memmove()'ing the built components over to the higher part of the buffer.

During the course of the rest of the program, all path manipulation should be using safe allocation managing strings such as std::string, or should be based off of the above described auto allocating getcwd() and similar functions, and constantly handling the memory management, growing as needed. Be careful when you need to get any path information from elsewhere, as you can never be sure how large it will be.

I hope developers realize that when not on Windows, using the incorrect define PATH_MAX is just wrong, and fix their applications. Next time, we'll discuss how one can implement their own realpath().

Dan said...

Hey, good to see you're finally back from your extended vacation. Bet the kids loved it, I know I always loved Disney World as a kid (hope they didn't make you go on "It's a Small World" too many times, that gets annoying after a while). Anyway, what you said about windows reminds me of a trick I used in high school to hide files on the school's network. I'd basically create a path as long as it would allow, put my games or whatever other forbidden files in there, and move the entire path once more into a new directory. Then when they tried to see what was in there, all they'd get is a recurrence of "New Folder/New Folder/New Folder/"etc. until they couldn't open it. It also wouldn't delete IIRC, and they weren't smart enough to realize to move it one level up, which is, of course, the method by which I would access my files there.

As for your wife's filesystem, what's she running? ReiserFS V5? Ext7? FAT4096? It's gotta be something cool if you're ripping mp5s.

The problem with PATH_MAX though, is that it's like a great sports play. You can't just rush into the score zone, you'd get a buffer overflow! But rather than implementing it sanely in a manner that you pass a buffer and a size, they decided to make their own number that has no relation whatsoever to the actual max path length. Sure, you could say something like "no one could possibly need a path longer than X", but we saw how well that worked when Gates Almighty stated that "640K ought to be enough for anybody". You just never know how much of anything will be enough for someone, and therein lies the flaw in many aspects of computing today.

In closing, stupid people suck.

L3thal said...

great topic , great explanation :)

fantastico said...

Windows allows approx 32k Unicode chars for the whole concatenated path, so long as:
1. You call the Unicode ('W') APIs rather than the OEM ('A') ones, eg CreateFileW; AND
2. You prepend the magic string '\\?\' to your path; AND
3. Your path is absolute or UNC, rather than relative.

I know this is wierd and sounds unlikely, but I've personally written a test program to exercise this bizarre feature.

MS documentation here:
http://msdn.microsoft.com/en-us/library/aa365247%28VS.85%29.aspx#maxpath

For extra kicks, you can also do:
\\?\UNC\myhostname\mysharename\my\big\long\sequence\of\subdirs

fantastico said...

Dan: the problem with PATH_MAX is not stupidity of the designers. It's the history of UNIX. Please don't accuse the designers of stupidity without doing some research first.

Modern *NIXes support multiple filesystem types simultaneously, each one with its own values for its own limits of various kinds.

Worse, most modern *NIXes support something like loadable kernel filesystem modules. So now entire types of filesystem can come and go at the whim of the sysadmin.

Given this, the idea of a single, static max-length number is wrong, regardless of its value. But that does not mean that those who came up with the idea were stupid.

Back in the day, UNIX had none of these features. In those days, the PATH_MAX concept was:
a) sufficient for then purposes;
b) simple for people to code to (compare modern sysconf);
b. Could be implemented efficiently on a PDP-11.

These were the times when malloc() performance sucked, so it was to be avoided at almost any cost - including static buffers with no bounds checking.

In closing, judgemental ignorant people suck, and they make themselves look silly when they spout off on other peoples' blogs.

Lisa Nek said...

Impressive one. I will like to share a thing about this that my problem of copying and moving of ling path file solved by suing long path tool. I Suggest everyone to try this.

hawkturkey said...

Not sure about other OS's, but in UNIX System V, PATH_MAX was instituted to protect the (single threaded) kernel from getting hogged by a user who prankishly opened a path of the form ././. [continue for several megabytes...] /./foo.bar, which would make the system freeze for all its users until the kernel had resolved the path. On a machine that was challenged to achieve 1 mips performance, this could be many seconds. So an absolute limit was set and enforced.

Artur said...

Try to use Long Path Tool program

weranga kaluarachchi said...

Hello,
My job involves license compliance at SAP. I have received a request for code posted on the following page "http://insanecoding.blogspot.ca/2007/11/pathmax-simply-isnt.html".

I was wondering that you grant SAP the right to use/copy/modify/distribute this code. Or, if possible, I request that you grant this code to SAP under the MIT license or another permissive license.

Thanks
Weranga

insane coder said...

I grant you the license to use this code intelligently if you document in the code where it came from.

Note, I don't even use this code. The article points out some issues with it, and the follow up articles improve upon it in various ways, and mention other possible improvements (of which I personally all implement but have not posted here). It is highly advisable to not use what you don't understand and fix things as needed for your use cases.

Bubbly Ideas said...

You can try Long Path Tool, it helped me a lot.

aiden carter said...

The Long path tool is the very best program for error, unlock solution.
Try it and solved your problem.
I used the long path tool and I solved my error, unlocks problem solution.

Anonymous said...

Long Path Tool is useful here

Corey lean said...

The windows API has MAX_LENGTH of 255 characters only. That's probably the answer to your problem. If you want to solve this you can try GS Richcopy 360. I am currently using this software and it has worked for me and my enterprise to solve all our problems related to file copying. Although its paid but it saves a lot of time and energy and time is money.

sai said...

This looks absolutely perfect. All these tiny details are made with lot of background knowledge. I like it a lot.
angularjs training in chennai
angularjs2 training in chennai | angularjs4 Training in Chennai
angularjs5 Training in Chennai

john jersy said...

Thank you so much for a well written, easy to understand article on this. It can get really confusing when trying to explain it – but you did a great job. Thank you!
Microsoft azure training in annanagar
Microsoft azure training in velarchery

pooja saravanan said...

Thanks for your informative article, Your post helped me to understand the future and career prospects & Keep on updating your blog with such awesome article.

Blueprism training in Chennai

Blueprism training in Bangalore

Blueprism training in Pune

Blueprism online training

Blueprism training in tambaram

vishnu said...

Your blog is very useful for my work. Keep sharing this kind of useful information.

Best Linux Training Institute in Chennai | Linux Course in Chennai | Learn Linux | Linux Course
| Linux Training in Tambaram | Linux Course in Velachery

amala jst said...

A very nice guide. I will definitely follow these tips. Thank you for sharing such detailed article. I am learning a lot from you.

rpa training in electronic-city | rpa training in btm | rpa training in marathahalli | rpa training in pune

Nila shri said...

Outstanding blog post, I have marked your site so ideally I’ll see much more on this subject in the foreseeable future.

Data Science Training in Chennai | Data Science training in anna nagar
Data Science training in chennai | Data science training in Bangalore
Data Science training in marathahalli | Data Science training in btm

Nandini T said...

This is good site and nice point of view.I learnt lots of useful information.
java training in annanagar | java training in chennai

java training in chennai | java training in electronic city

Unknown said...

Really Great Post Thanks for sharing.

Cloud Computing Training in Chennai | IT Software Training in Chennai | Data Science Training Chennai | DevOps Training Chennai

Revathy A said...

I wanted to thank you for this great read!! I definitely enjoying every little bit of it I have you bookmarked to check out new stuff you post.is article.
angularjs Training in chennai
angularjs-Training in pune

angularjs-Training in chennai

angularjs Training in chennai

angularjs-Training in tambaram

angularjs-Training in sholinganallur

Sri Nithya said...

Wonderful post. Thanks for taking time to share this information with us.
Blue Prism Training in Chennai
Blue Prism Training
RPA Training in Chennai
Robotics Process Automation Training in Chennai
AWS course in Chennai
Angular 6 Training in Chennai

The blog which you have shared is more useful for us. Thanks for your information.
German Language Course
German Courses in Coimbatore
German Courses Near Me
Learn German Course
German Language Training

pavithra dass said...

Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
Web Designing Course in chennai
Java Training in Chennai
Web development training in chennai
website design training
Best Java Training Institute in Chennai
Java Training

Annie said...

Awwsome informative blog ,Very good information thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us. Aviation Courses in Chennai | Best Aviation Academy in Chennai | Aviation Academy in Chennai | Aviation Training in Chennai | Aviation Institute in Chennai

Hemapriya said...

Great information. Thanks to your blog for sharing with us.
php training center in coimbatore
php training institute in coimbatore
php training coimbatore
php training institute in coimbatore
best php training institute

afiah ahamed said...

Read all the information that i've given in above article. It'll give u the whole idea about it.
Java training in Chennai | Java training institute in Chennai | Java course in Chennai

Java training in Bangalore | Java training institute in Bangalore | Java course in Bangalore

amsa leka said...

Hey, Wow all the posts are very informative for the people who visit this site. Good work! We also have a Website. Please feel free to visit our site. Thank you for sharing.
Well written article. Thank You Sharing with Us angular 7 training in velachery

Vicky Ram said...

Nice post. I learned some new information. Thanks for sharing.

chocolatesanddreams
Technology

Gautam krish said...

The blog which you have shared is more useful for us. Thanks for your information.
German Coaching Classes in Coimbatore
German Language Learning
Learn German in Coimbatore
German Coaching Class in Coimbatore
German Language Coaching Classes in Coimbatore

gowsalya said...

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
Best Devops training in sholinganallur
Devops training in velachery
Devops training in annanagar
Devops training in tambaram

pavithra dass said...

This post is much helpful for us. This is really very massive value to all the readers and it will be the only reason for the post to get popular with great authority.
German Classes in Chennai
German Language Classes in Chennai
German Language Course in Chennai
Best Java Training Institute in Chennai
Java Training
Java Classes in Chennai

nivatha said...

Appreciating the persistence you put into your blog and detailed information you provide
Data Science course in Chennai | Best Data Science course in Chennai
Data science course in bangalore | Best Data Science course in Bangalore
Data science course in pune | Data Science Course institute in Pune
Data science online course | Online Data Science certification course-Gangboard
Data Science Interview questions and answers

mercyroy said...

Great!it is really nice blog information.after a long time i have grow through such kind of ideas.thanks for share your thoughts with us.
Angular 6 training in Bangalore
Angular JS Training courses near me
Best AngularJS Training Institute in Anna nagar
AngularJS Training in T nagar

thulasi ragini said...

This is a nice post in an interesting line of content.Thanks for sharing this article, great way of bring this topic to discussion.
python course in pune
python course in chennai
python course in Bangalore

Ananya Krishnan said...

Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

Good to learn about DevOps at this time.

devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

Praylin S said...

Well written post with clear and precise details. Your article is worth reading. Keep posting more articles like this. Great job. Regards.
Microsoft Dynamics CRM Training in Chennai | Microsoft Dynamics CRM Training Courses | Microsoft Dynamics Training | Microsoft CRM Training

sudha P said...

Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.

Java training in Chennai | Java training in Omr

Oracle training in Chennai

Java training in Chennai | Java training in Annanagar

Java training in Chennai | Java training institute in Chennai | Java course in Chennai

john jersy said...

Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..
python course in pune
python course in chennai
python course in Bangalore

Katherine tk said...
This comment has been removed by the author.
Katherine tk said...

Good informative post with explanation. Really I found some information here to get next level of technology. keep posting...

Thanks
TekSlate
(https://goo.gl/g2sydT)

gowsalya said...

This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me..

Online DevOps Certification Course - Gangboard
Best Devops Training institute in Chennai

Praylin S said...

I'm really inspired by the way your article is written. Thanks for sharing such a wonderful article. Regards.
Placement Training in Chennai | Training institutes in Chennai with Placement | Best Training and Placement institutes in Chennai | Placement Training institutes | Placement Training Centres in Chennai | Placement Training institutes in Chennai | Best Placement Training institutes in Chennai | Training and Job Placement in Chennai | Training come Placement in Chennai | Placement Courses in Chennai | Training and Placement institutes in Chennai

Saro said...

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
rpa training in Chennai | rpa training in bangalore | best rpa training in bangalore | rpa course in bangalore | rpa training institute in bangalore | rpa training in bangalore | rpa online training

ram said...

It would have been the happiest moment for you,I mean if we have been waiting for something to happen and when it happens we forgot all hardwork and wait for getting that happened.
Data Science course in Indira nagar
Data Science course in marathahalli
Data Science Interview questions and answers
Data science training in tambaram | Data Science Course in Chennai
Data Science course in btm layout | Data Science training in Bangalore
Data science course in kalyan nagar | Data Science Course in Bangalore

pragyachitra said...

Just stumbled across your blog and was instantly amazed with all the useful information that is on it. Great post, just what i was looking for and i am looking forward to reading your other posts soon!
angularjs-Training in sholinganallur

angularjs-Training in velachery

angularjs-Training in pune

angularjs Training in bangalore

angularjs Training in bangalore

angularjs Training in btm

Splendid Interiors said...

Ah,so beautiful and wonderful post!An opportunity to read a fantastic and imaginary blogs.It gives me lots of pleasure and interest.Thanks for sharing.
Find the Interior Designers in Vijayawada

ananthinfo said...
StarRaja said...

StreamD helps Indian Online Shoppers to choose the BEST LED TV and Cheap products in tech space through our carefully data-backed analysis of products by Industry Experts.

Aruna Ram said...

This is very great thinks. It was very comprehensive post and powerful concept. Thanks for your sharing with as. Keep it up....
Web Designing Course in Bangalore
Web Designing Training in Bangalore
Web Designing Training in Tnagar
Web Designing Training in Velachery
Web Designing Course in Omr
Web Designing Training in Tambaram

sunil kumar said...

Thanks for posting useful information.You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting for your next post keep on updating these kinds of knowledgeable things...Really it was an awesome article...very interesting to read..please sharing like this information......
PHP interview questions and answers | PHP interview questions | PHP interview questions for freshers | PHP interview questions and answers for freshers | php interview questions and answers for experienced | php viva questions and answers | php based interview questions

lekha mathan said...

Very useful information, Keep posting more blog like this, Thank you.
Aviation Courses in Chennai
aviation training in chennai

aarav verma said...
john brito said...

Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.

rpa training in chennai
Best rpa training in bangalore
rpa course in bangalore
rpa training in marathahalli
rpa training in btm
best rpa training in chennai

jenifer irene said...

Thank you so much for providing information on this. It was very useful.
air hostess training in Bangalore
air hostess training institute

Praylin S said...

Great post! You are providing so much of great information. Keep us updated. Thank you.
Microsoft Dynamics CRM Training in Chennai | Microsoft Dynamics Training in Chennai | Microsoft Dynamics CRM Training | Microsoft Dynamics CRM Training institutes in Chennai | Microsoft Dynamics Training | Microsoft CRM Training | Microsoft Dynamics CRM Training Courses | CRM Training in Chennai

Aruna Ram said...

This blog is very interesting and powerful content. I got more important information and it's very useful for improve my knowledge.
Tableau Certification in Bangalore
Tableau Training Institutes in Bangalore
Tableau Classes in Bangalore
Tableau Coaching in Bangalore
Tableau Training in Bangalore

Rithi Rawat said...

Very nice post here thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
Check out :
Best institutes for machine learning in chennai

machine learning certification in chennai

IT Tutorials said...

Thank you so much for your information,its very useful and helful to me.Keep updating and sharing. Thank you.
RPA training in chennai | UiPath training in chennai