Latest |Kites |Pictures |Programming |Life
[filed under Programming]Normalize URL path python

I had a problem with some URLs that we found on someones website. They looked like this:

<a href="">, here is the same link: test. Notice that when you mouse over it, Firefox normalizes the URL so it looks correct.

Using urlparse in Python:

print urlparse.urljoin( '', '/path/../path/.././path/./' )

How, poopy.

So, we need to do better than that. The os module has a path notmalizer in it, but this would go squiffy when run on a Windows box because it will translate all the / to \. But there is a module which we can use:

import urlparse
import posixpath

def join(base,url):
    join = urlparse.urljoin(base,url)
    url = urlparse.urlparse(join)
    path = posixpath.normpath(url[2])
    return urlparse.urlunparse(

# strange paths
print join( '''/path/../path/.././path/./' )
print join( '''/path/../path/.././path/./y.html' )

# paths that .. up too far
print join( '''../../../../path/' )
print join( '''../../../../path/moo.html' )

# how path and base path combine
print join( '''1/2/3/moo.html' )
print join( '''../1/2/3/moo.html' )

This prints:

Cool, huh?

8th of April, 2008@3:04:07 PM
add a comment, permanent link to article



Sorry, but until I can figure out how to stop the spam, comments are disabled :(

Server Grind [0.0219 seconds]