Category: Python

Lazy File Reading in Python

11/20/2013

I was checking a youtube video about haskell and its lazy file reading options and was wondering... wait a sec I can do that in python too.. We have generators too :P

Generators are pretty cool when it comes to lazy evaluation. Now the purists reading this post... I know strictly speaking this may not be LE.. but it comes closer to it.

0 Comments

This is the extension of the problem I faced while messing with boost::asio library. So, finally I attempted to run different async loops in muti-threads and have spectacularly failed. Let me share some experiences of mine wrt. The following is the code using boost libraries , dont worry if you dont understand it much. So the easiest way to interface c++ with python is to create the "c" wrapper for the same and then creating a shared .so file for it. Python has an awesome (not so awesome actually ) library called cytpes where u can call .so files in ur python program.

So, compiling this program seems straight forward assuming boost is already been setup.

$ g++ -c -fPIC scale.cpp -o scale.o -L/usr/lib -lboost_system -lboost_thread

$ g++ -shared -Wl,-soname,libscale.so -o libscale.so scale.o

Soo ideally this should work right ? I mean why not you have created your object file with necessary boost links and then create a shared .so of the same. Lets create a simple python wrapper for the same and call the run module.

If u run the same using an interpreter , voila .. as soon as you load the library you will see the following error. Damn! its not able to identify the boost_thread symbols ... But how is this possible ?

>>> lib = cdll.LoadLibrary('./libscale.so')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/ctypes/__init__.py", line 431, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.6/ctypes/__init__.py", line 353, in __init__
self._handle = _dlopen(self._name, mode)
OSError: ./libscale.so: undefined symbol: _ZTIN5boost6detail16thread_data_baseE

rahulram ~/programs/cpp $ ldd ./libscale.so
linux-gate.so.1 => (0xb789c000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb775f000)
libm.so.6 => /lib/libm.so.6 (0xb7723000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7705000)
libc.so.6 => /lib/libc.so.6 (0xb75a7000)
/lib/ld-linux.so.2 (0xb789d000)

If u see the linked libraries , libscale.so doest not include the boost libraries. Why on the world will gcc compiler not warn me of that ? I have no frekin idea .. Finally I had to recreate the .so files mentionnig boost_threads and systems

rahulram~/programs/cpp $ g++ -shared -Wl,-soname,libscale.so -o libscale.so scale.o -lboost_system -lboost_thread

rahulram ~/programs/cpp $ ldd libscale.so
linux-gate.so.1 => (0xb783e000)
libboost_system.so.1.42.0 => /usr/lib/libboost_system.so.1.42.0 (0xb77e3000)
libboost_thread.so.1.42.0 => /usr/lib/libboost_thread.so.1.42.0 (0xb77cf000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb76e7000)
libm.so.6 => /lib/libm.so.6 (0xb76ab000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb768e000)
libc.so.6 => /lib/libc.so.6 (0xb7530000)
librt.so.1 => /lib/librt.so.1 (0xb7527000)
libpthread.so.0 => /lib/libpthread.so.0 (0xb750d000)
/lib/ld-linux.so.2 (0xb783f000)

ohh la la .. It seems to have linked the desired ilbraries . Now you can re-run the same python program, it should work..

PS: I learnt this the hard way :(

1 Comment

Memory Allocations in python

1/5/2013

0 Comments

Python memory allocations seems pretty convoluted when I first began to wonder on what happens to memory when the object is not being used anymore. Well, memory allocation happens using malloc and free (Python is written in C ) and these operations are pretty expensive ( I will quantify that in a while).

Now, usually what you will think is when I allocate memory in delete it , the memory consumed by the objec t is freed. Guess what ? It is definetly not and I assure you thats not a bug in python. Lets take a closer look on how this works.

Until Python 2.3ish versions , python primarily handled memory management using reference counting (Google it :P) . Here is an interesting article (Its a must read) on how the memory handling happens, when only reference counting was used. The summaries, python holds the memory when the objects is deleted. Why? Cuz memory operations are expensive so if your program has a sudden burst of object creations, without pyMalloc you would suffer huge performance issues. Since python interpreter holds the memory for you (which any decent high level language should do) , you dont have to worry about it, unless you intend to write a long daemon process and have throw int, floats and lists all over your program. (ints / floats / lists are not handled via PyMalloc ) . But definitely a BIG reason to worry, if the code is running in the server and for very very long time.

So what do we do about it ?
Good news. On later versions (probably python 2.6 onwards , I am not too sure on that ) Python uses more reference counting and garbage collections. I am not getting into the debate on whether GC is a good thing or not. Hard-core C++ programmers will never agree with me, as GC is not suitable for real time systems. Since, they use significant amount of computing power to clean up memory.
Automatic garbage collection will not run if your Python device is running out of memory

In your python program, you can " import gc " (GC in python). By default GC is enabled in python , unless u manually disabled it during installation.

import gc

gc.get_threshold() # Will print the memory threshold limit after which GC will run in python
gc.collect() #will collect unwanted objects and freeup memory for you.

If you are running a long daemon process, you can call gc.collect() after a function call or could be time based. Its upto you.

A few Caveats to keep in mind:

Its better off using xrange functions to populate a list than range. Lists are pretty tricky in python , they are immortal and unbound. For example .

l = range(20000)
del l
You may have deleted l , but all the memory pertaining to it is been help up by python interpreter
(What ?? I am still confused . Read the article I mentioned above). In such scenarios using xrange is
a better idea

Debug your code with Garbage Collector

In here we are creating a single object . The following is the output

As you can see the GC has detected L dictionary to be unreachable and after we run gc.collect() the object is freed.

0 Comments

Lazy File Reading in Python

Interfacing C++ with Python

Memory Allocations in python

Archives

Categories