Insane Coding: PATH_MAX simply isn't

Saturday, November 3, 2007

PATH_MAX simply isn't

Posted by insane coder at Saturday, November 03, 2007

Many C/C++ programmers at some point may run into a limit known as PATH_MAX. Basically, if you have to keep track of paths to files/directories, how big does your buffer have to be?
Most Operating Systems/File Systems I've seen, limit a filename or any particular path component to 255 bytes or so. But a full path is a different matter.

Many programmers will immediately tell you that if your buffer is PATH_MAX, or PATH_MAX+1 bytes, it's long enough. A good C++ programmer of course would use C++ strings (std::string or similar with a particular API) to avoid any buffer length issues. But even when having dynamic strings in your program taking care of the nitty gritty issue of how long your buffers need to be, they only solve half the problem.

Even a C++ programmer may at some point want to call the getcwd() or realpath() (fullpath() on Windows) functions, which take a pointer to a writable buffer, and not a C++ string, and according to the standard, they don't do their own allocation. Even ones that do their own allocation very often just allocate PATH_MAX bytes.

getcwd() is a function to return what the current working directory is. realpath() can take a relative or absolute path to any filename, containing .. or levels of /././. or extra slashes, and symlinks and the like, and return a full absolute path without any extra garbage. These functions have a flaw though.

The flaw is that PATH_MAX simply isn't. Each system can define PATH_MAX to whatever size it likes. On my Linux system, I see it's 4096, on my OpenBSD system, I see it's 1024, on Windows, it's 260.

Now performing a test on my Linux system, I noticed that it limits a path component to 255 characters on ext3, but it doesn't stop me from making as many nested ones as I like. I successfully created a path 6000 characters long. Linux does absolutely nothing to stop me from creating such a large path, nor from mounting one large path on another. Running getcwd() in such a large path, even with a huge buffer, fails, since it doesn't work with anything past PATH_MAX.

Even a commercial OS like Mac OS X defines it as 1024, but tests show you can create a path several thousand characters long. Interestingly enough, OSX's getcwd() will properly identify a path which is larger than its PATH_MAX if you pass it a large enough buffer with enough room to hold all the data. This is possible, because the prototype for getcwd() is:

char *getcwd(char *buf, size_t size);

So a smart getcwd() can work if there's enough room. But unfortunately, there is no way to determine how much space you actually need, so you can't allocate it in advance. You'd have to keep allocating larger and larger buffers hoping one of them will finally work, which is quite retarded.

Since a path can be longer than PATH_MAX, the define is useless, writing code based off of it is wrong, and the functions that require it are broken.

An exception to this is Windows. It doesn't allow any paths to be created larger than 260 characters. If the path was created on a partition from a different OS, Windows won't allow anything to access it. It sounds strange that such a small limit was chosen, considering that FAT has no such limit imposed, and NTFS allows paths to be 32768 characters long. I can easily imagine someone with a sizable audio collection having a 300+ character path like so:

"C:\Documents and Settings\Jonathan Ezekiel Cornflour\My Documents\My Music\My Personal Rips\2007\Technological\Operating System Symphony Orchestra\The GNOME Musical Men\I Married Her For Her File System\You Don't Appreciate Marriage Until You've Noticed Tax Pro's Wizard For Married Couples.Track 01.MP5"

Before we forget, here's the prototype for realpath:

char *realpath(const char *file_name, char *resolved_name);

Now looking at that prototype, you should immediately say to yourself, but where's the size value for resolved_name? We don't want a buffer overflow! Which is why OSs will implement it based on the PATH_MAX define.

The resolved_name argument must refer to a buffer capable of storing at least PATH_MAX characters.

Which basically means, it can never work on a large path, and no clever OS can implement around it, unless it actually checks how much RAM is allocated on that pointer using an OS specific method - if available.

For these reasons, I've decided to implement getcwd() and realpath() myself. We'll discuss the exact specifics of realpath() next time, for now however, we will focus on how one can make their own getcwd().

The idea is to walk up the tree from the working directory, till we reach the root, along the way noting which path component we just went across.
Every modern OS has a stat() function which can take a path component and return information about it, such as when it was created, which device it is located on, and the like. All these OSs except for Windows return the fields st_dev and st_ino which together can uniquely identify any file or directory. If those two fields match the data retrieved in some other way on the same system, you can be sure they're the same file/directory.
To start, we'd determine the unique ID for . and /, once we have those, we can construct our loop. At each step, when the current doesn't equal the root, we can change directory to .., then scan the directory (using opendir()+readdir()+closedir()) for a component with the same ID. Once a matching ID is found, we can denote that as the correct name for the current level, and move up one.

Code demonstrating this in C++ is as follows:


  bool getcwd(std::string& path)
  {
    typedef std::pair<dev_t, ino_t> file_id;
    
    bool success = false;
    int start_fd = open(".", O_RDONLY); //Keep track of start directory, so can jump back to it later
    if (start_fd != -1)
    {
      struct stat sb;
      if (!fstat(start_fd, &sb))
      {
        file_id current_id(sb.st_dev, sb.st_ino);
        if (!stat("/", &sb)) //Get info for root directory, so we can determine when we hit it
        {
          std::vector<std::string> path_components;
          file_id root_id(sb.st_dev, sb.st_ino);

          while (current_id != root_id) //If they're equal, we've obtained enough info to build the path
          {
            bool pushed = false;

            if (!chdir("..")) //Keep recursing towards root each iteration
            {
              DIR *dir = opendir(".");
              if (dir)
              {
                dirent *entry;
                while ((entry = readdir(dir))) //We loop through each entry trying to find where we came from
                {
                  if ((strcmp(entry->d_name, ".") && strcmp(entry->d_name, "..") && !lstat(entry->d_name, &sb)))
                  {
                    file_id child_id(sb.st_dev, sb.st_ino);
                    if (child_id == current_id) //We found where we came from, add its name to the list
                    {
                      path_components.push_back(entry->d_name);
                      pushed = true;
                      break;
                    }
                  }
                }
                closedir(dir);

                if (pushed && !stat(".", &sb)) //If we have a reason to contiue, we update the current dir id
                {
                  current_id = file_id(sb.st_dev, sb.st_ino);
                }
              }//Else, Uh oh, can't read information at this level
            }
            if (!pushed) { break; } //If we didn't obtain any info this pass, no reason to continue
          }

          if (current_id == root_id) //Unless they're equal, we failed above
          {
            //Built the path, will always end with a slash
            path = "/";
            for (std::vector<std::string>::reverse_iterator i = path_components.rbegin(); i != path_components.rend(); ++i)
            {
              path += *i+"/";
            }
            success = true;
          }
          fchdir(start_fd);
        }
      }
      close(start_fd);
    }

    return(success);
  }

Before we accept that as the defacto method to use in your application, let us discuss the flaws.

As mentioned above, it doesn't work on Windows, but a simple #ifdef for Windows can just make it a wrapper around the built in getcwd() with a local buffer of size PATH_MAX, which is fine for Windows, and pretty much no other OS.

This function uses the name getcwd() which can conflict with the built in C based one which is a problem for certain compilers. The fix is to rename it, or put it in its own namespace.

Next, the built in getcwd() implementations I checked only have a trailing slash on the root directory. I personally like having the slash appended, since I'm usually concatenating a filename onto it, but note that if you're not using it for concatenation, but to pass to functions like access(), stat(), opendir(), chdir(), and the like, an OS may not like doing the call with a trailing slash. I've only noticed that being an issue with DJGPP and a few functions. So if it matters to you, the loop near the end of the function can easily be modified to not have the trailing slash, except in the case that the root directory is the entire path.

This function also changes the directory in the process, so it's not thread safe. But then again, many built in implementations aren't thread safe either. If you use threads, calculate all the paths you need prior to creating the threads. Which is probably a good idea, and keep using path names based off of your absolute directories in your program, instead of changing directories during the main execution elsewhere in the program. Otherwise, you'll have to use a mutex around the call, which is also a valid option.

There could also be the issue that some level of the path isn't readable. Which can happen on UNIX, where to enter a directory, one only needs execute permission, and not read permission. I'm not sure what one can do in that case, except maybe fall back on the built in one hoping it does some magical Kernel call to get around it. If anyone has any advice on this one, please post about it in the comments.

Lastly, this function is written in C++, which is annoying for C users. The std::vector can be replaced with a linked list keeping track of the components, and at the end, allocate the buffer size needed, and return the allocated buffer. This requires the user to free the buffer on the outside, but there really isn't any other safe way of doing this.
Alternatively, instead of a linked list, a buffer which is constantly reallocated can be used while building the path, constantly memmove()'ing the built components over to the higher part of the buffer.

During the course of the rest of the program, all path manipulation should be using safe allocation managing strings such as std::string, or should be based off of the above described auto allocating getcwd() and similar functions, and constantly handling the memory management, growing as needed. Be careful when you need to get any path information from elsewhere, as you can never be sure how large it will be.

I hope developers realize that when not on Windows, using the incorrect define PATH_MAX is just wrong, and fix their applications. Next time, we'll discuss how one can implement their own realpath().

2,425 comments:

«Oldest ‹Older 2401 – 2425 of 2425

Skyappzprathi said...: I learn a lot of thing from this site which gave me a lot of coding difficulties answers..
check this out about dogs.Custom software company in Dubai; September 24, 2025 at 11:52 PM
Documentation in Python code is crucial for ensuring readability, maintainability, and collaboration within a project. Here are some best practices for effective Python code documentation@ www.nearlea said...: Highly recommend Power BI @ NearLearn for anyone looking to boost their career in data analytics and business intelligence.” https://nearlearn.com/courses/business-intelligence-visualization/power-business-intelligence-training-and-certification; October 3, 2025 at 1:14 AM
skyappz said...: Thanks for sharing such detailed and useful content. I’ll be returning to this site often. Keep up the amazing work.Android App UI/UX Design Experts in Dubai; October 4, 2025 at 9:43 AM
skyappz said...: This article gives the light in which we can observe the reality.Android App UI/UX Design Experts in Dubai; October 14, 2025 at 10:32 PM
Skyappzprathi said...: Nice post.

Big data insights for business growth in Dubai; October 15, 2025 at 12:27 AM
learnmoretechnologiess said...: ChatGPT said:

Great post! You’ve clearly explained the pitfalls of relying on PATH_MAX and how it differs across systems. The breakdown of getcwd() and realpath() behavior was especially insightful, and your custom implementation is a smart, practical workaround. Excellent read for anyone writing portable C/C++ code — looking forward to your next post on realpath()!

power Bi training in Bangalore; October 16, 2025 at 12:32 AM
John albert said...: Great post! Your explanation about PATH_MAX and its limitations really clarifies a common misunderstanding among developers. It’s easy to assume that there’s always a fixed maximum path length, but your discussion shows why relying on it can lead to subtle bugs and portability issues

Pte training in Bangalore; October 16, 2025 at 5:11 AM
Documentation in Python code is crucial for ensuring readability, maintainability, and collaboration within a project. Here are some best practices for effective Python code documentation@ www.nearlea said...: NearLearn provides excellent training in Python, Data Science, AI, and Full Stack Development. Their practical teaching style makes learning very effective. Please visit our website https://nearlearn.com/courses/business-intelligence-visualization/power-business-intelligence-training-and-certification; October 16, 2025 at 3:07 PM
gizmomachinetoolsinc said...: We specialize in used boring mills, CNC machines, and heavy industrial equipment in Canada.
https://www.gizmo-mt.ca/boring-mill; October 22, 2025 at 3:08 AM
Yogisgift said...: Experience the power of Ayurveda with Wild Turmeric Powder (Kasturi Turmeric), Herbal Hair Oil Mix, and Indigo Powder for Hair, blended with Henna, Amla, and Sidr Leaf Powder for stronger, healthier hair. Support your wellness with pure Ashwagandha Powder Bulk, Moringa Powder Bulk, and Ginger Powder Bulk, while caring for your skin using French Green Clay Powder, Multani Mitti, and Activated Charcoal Powder Bulk. Choose natural living with Soapnuts in Bulk from reliable Soapnut Suppliers and sustainable Jute Bags Bulk for an eco-friendly lifestyle
https://www.yogisgift.com/products/wild-turmeric-powder; October 26, 2025 at 11:02 AM
Sleuth India said...: Private Detective Agency in India
private detective agency in vadodara
private detective agency in thane
private detective agency kochi; November 7, 2025 at 1:33 AM
Moindigital said...: Nice write-up, career-focused programs like these are in high demand.

Azure DevOps Training in KPHB Hyderabad; November 7, 2025 at 4:21 AM
MeasurePM said...: I just wanted to say how much I enjoyed this. It's so well-written and clear—you've done a fantastic job of laying everything out. It's obvious a lot of care went into this. Thank you for sharing!

aba therapy software
aba data collection software; November 13, 2025 at 5:17 AM
Skyappz said...: This post is so useful and informative keep updating with more information. Code++, Software Training Institute in Coimbatore; November 18, 2025 at 3:27 AM
Skyappz said...: Thank you for sharing such valuable information!. I’m looking forward to seeing more of your notes and insights in future posts. Artificial Intelligence Course in coimbatore; November 21, 2025 at 2:27 AM
Skyappz said...: Great blog! It provides valuable insights and information. Thanks for sharing this wonderful content. check this out guys Artificial Intelligence Course in coimbatore; November 29, 2025 at 2:02 AM
Upskill GENERATIVE AI said...: Great post — you highlight a real problem when people assume PATH_MAX is enough for all file-system paths. I especially like how you demonstrated that even on a system that supports long paths, using fixed-size buffers (or functions expecting PATH_MAX buffers) can break when nested directories go deep. That’s a subtle but important bug that many developers probably don’t think about.

Your custom getcwd()/realpath() implementation is clever, walking up the directory tree and reconstructing the full path dynamically — a nice demonstration of why dynamic allocation or “grow-as-needed” logic is often safer than static limits.

generative ai training in hyderabad; November 30, 2025 at 10:09 PM
John Cooper said...: Mitolyn Reviews 2026: What You Need to Know
Explore comprehensive reviews of Mitolyn for 2026. Get the facts and see if it's right for you. Click to learn more!
mitolyn reviews 2026; December 2, 2025 at 11:31 AM
John Cooper said...: Natural Thyroid Support Explained
Get the facts on thyroid health with our complete review. Discover effective natural solutions. Click now for more information!

Complete Thyroid Review 2026; December 6, 2025 at 4:35 AM
jessicasmith said...: Rush Patch Custom Military Name Tags for Uniforms Durable Velcro, Regulation-Ready & Fast USA Shipping.
Looking for high-quality military name tags for uniforms with fast turnaround? rush patch dot com delivers regulation-ready, durable, and fully customizable name tags for every branch Army, Navy, Air Force, Marines & more. Whether you need military uniform names, military name tags Velcro-backed, or custom military name tags near me, we make it simple, affordable, and fast. Our tags are designed to meet Navy uniform name tag regulations, standard military sizing, and official placement guidelines so you always look inspection-ready. Choose from sew-on or Velcro, came-matched fabrics, classic black/OD green, or full-colour options. Perfect for active duty, ROTC, reserves, tactical gear, duffel bags, rucksacks, and civilian collectors. Whether you’re comparing military name tags for uniforms vs regular name tags or need precise placement info like what side military name tags go on, we’ve got you covered. If you ever wondered “Do military uniforms have name tags?”, “How to make military name tags for uniforms?” or need rugged tags that endure field use Rush Patch is your trusted source. Boost your uniform’s professional look with long-lasting, sharp-stitched military name tapes that never quit. Order today and get your custom tags shipped fast across the USA!
Visit our website. https://rushpatch.com/blogs/news/military-name-tags-for-uniforms-essential-guide-types; December 11, 2025 at 6:38 AM
sravanthi said...: Awesome post.Thank you for sharing.Keep posting.

Oracle BPM Suite 12c Online Training
Hyperion Financial Management Training from UK
SQL Server Training Institute In Bangalore
Splunk Administration Online Training from Chennai
Oracle GoldenGate Online Course from India; December 11, 2025 at 10:17 PM
ONLEI Technologies said...: Unlock exclusive insights and resources—explore more on
ONLEI Technologies Mediatoz
ONLEI Technologies TrustMyView
QnAspot
ONLEI Technologies AASoft
ONLEI Technologies Reviews
ONLEI Technologies Reviews; December 16, 2025 at 2:28 AM
John Cooper said...: Discover Mitolyn: Your Solution for Enhanced Wellness. Experience natural health benefits today. Join countless satisfied customers and transform your life! :

Mitolyn Reviews 2026 Updated (USA); December 16, 2025 at 8:21 AM
Yogisgift said...: I’ve been searching for a truly pure Wild Turmeric Powder for my face masks and beauty routine, and this one from Yogi’s Gift stands out. It’s made from Curcuma aromatica, not the regular kitchen turmeric, and it helps improve skin tone, fight acne marks, and boost natural glow without harsh chemicals. I noticed a visible difference in my complexion in just a few weeks smoother, brighter, and blemish-free skin. 🌿💛
If you’re into Ayurvedic beauty secrets, this is worth trying! 👉 https://www.yogisgift.com/products/wild-turmeric-powder; December 30, 2025 at 1:24 AM
yogis gift said...: Great article! Kasturi turmeric is truly underrated, especially for skincare and traditional beauty routines. I’ve personally found wild turmeric powder very effective for natural face masks and herbal remedies. For anyone interested, this is a good-quality source of wild turmeric powder I came across: https://www.yogisgift.com/products/wild-turmeric-powder; December 30, 2025 at 2:21 AM

«Oldest ‹Older 2401 – 2425 of 2425 Newer› Newest»

Insane Coding

Saturday, November 3, 2007

PATH_MAX simply isn't

2,425 comments:

Insane Coding Sites

Blog Archive

Search This Blog

Please Add This

Insane Coding

Saturday, November 3, 2007

PATH_MAX simply isn't

2,425 comments:

Insane Coding Sites

Blog Archive

Search This Blog

Subscribe To

Please Add This