I was checking a youtube video about haskell and its lazy file reading options and was wondering... wait a sec I can do that in python too.. We have generators too :P
Generators are pretty cool when it comes to lazy evaluation. Now the purists reading this post... I know strictly speaking this may not be LE.. but it comes closer to it.
One of my favorite one-liner is to convert a list of sets into a dictionary. Its awesome how easily I can do it in python.
This is the extension of the problem I faced while messing with boost::asio library. So, finally I attempted to run different async loops in muti-threads and have spectacularly failed. Let me share some experiences of mine wrt. The following is the code using boost libraries , dont worry if you dont understand it much. So the easiest way to interface c++ with python is to create the "c" wrapper for the same and then creating a shared .so file for it. Python has an awesome (not so awesome actually ) library called cytpes where u can call .so files in ur python program.
So, compiling this program seems straight forward assuming boost is already been setup.
$ g++ -c -fPIC scale.cpp -o scale.o -L/usr/lib -lboost_system -lboost_thread
Soo ideally this should work right ? I mean why not you have created your object file with necessary boost links and then create a shared .so of the same. Lets create a simple python wrapper for the same and call the run module.
If u run the same using an interpreter , voila .. as soon as you load the library you will see the following error. Damn! its not able to identify the boost_thread symbols ... But how is this possible ?
>>> lib = cdll.LoadLibrary('./libscale.so')
If u see the linked libraries , libscale.so doest not include the boost libraries. Why on the world will gcc compiler not warn me of that ? I have no frekin idea .. Finally I had to recreate the .so files mentionnig boost_threads and systems
rahulram~/programs/cpp $ g++ -shared -Wl,-soname,libscale.so -o libscale.so scale.o -lboost_system -lboost_thread
ohh la la .. It seems to have linked the desired ilbraries . Now you can re-run the same python program, it should work..
PS: I learnt this the hard way :(
The Xy problem is a very classic case of user/client asking for a problem X, but intending to solve problem Y and finally lands up sapping everyone. So, this is how it goes
Heres an example I encountered.
Problem Title: Pattern Matching and Regular Expression
The input file is in the following format and the data is to be crunched.
The title is soo obtuse . This is a simple case of structuring unstructured data. Anyways after a lot of hocth-poch I figured out what was really required and started brainstorming on it.
Whenever it comes to python wrt to regex I kinda hesitate to use regex in them , infact I try as possible to avoid it. Why ?
I am bad @ regex . I tend to miss out edge cases and then get screwed. Secondly , I have happen to see some deadly Perl regex and they can crunch data way faster than what python could do. I dont intend to start a language debate, but most of the times i have seen perl regex do much better than pythons. You could compile the regex and then use it in python for better performance. But still, this is one area where Perl dominates.
The initial solution I saw for the above problem was to fetch the whole file content into a string and compile it using regex. Seems like a fair deal , but here is the problem
I prefer reading the file line by line and then structring it because , all we needed to extract was title and its relevant context. My main concern was to avoid loading the whole file at once and keep regex to minimum.
I agree , the code in not exactly pythonic. It can be made more pythonic , but that exercise i left to OP ;)
Python memory allocations seems pretty convoluted when I first began to wonder on what happens to memory when the object is not being used anymore. Well, memory allocation happens using malloc and free (Python is written in C ) and these operations are pretty expensive ( I will quantify that in a while).
Now, usually what you will think is when I allocate memory in delete it , the memory consumed by the objec t is freed. Guess what ? It is definetly not and I assure you thats not a bug in python. Lets take a closer look on how this works.
Until Python 2.3ish versions , python primarily handled memory management using reference counting (Google it :P) . Here is an interesting article (Its a must read) on how the memory handling happens, when only reference counting was used. The summaries, python holds the memory when the objects is deleted. Why? Cuz memory operations are expensive so if your program has a sudden burst of object creations, without pyMalloc you would suffer huge performance issues. Since python interpreter holds the memory for you (which any decent high level language should do) , you dont have to worry about it, unless you intend to write a long daemon process and have throw int, floats and lists all over your program. (ints / floats / lists are not handled via PyMalloc ) . But definitely a BIG reason to worry, if the code is running in the server and for very very long time.
So what do we do about it ?
Good news. On later versions (probably python 2.6 onwards , I am not too sure on that ) Python uses more reference counting and garbage collections. I am not getting into the debate on whether GC is a good thing or not. Hard-core C++ programmers will never agree with me, as GC is not suitable for real time systems. Since, they use significant amount of computing power to clean up memory.
Automatic garbage collection will not run if your Python device is running out of memory
In your python program, you can " import gc " (GC in python). By default GC is enabled in python , unless u manually disabled it during installation.
gc.get_threshold() # Will print the memory threshold limit after which GC will run in python
gc.collect() #will collect unwanted objects and freeup memory for you.
If you are running a long daemon process, you can call gc.collect() after a function call or could be time based. Its upto you.
A few Caveats to keep in mind:
You may have deleted l , but all the memory pertaining to it is been help up by python interpreter
(What ?? I am still confused . Read the article I mentioned above). In such scenarios using xrange is
a better idea
In here we are creating a single object . The following is the output
As you can see the GC has detected L dictionary to be unreachable and after we run gc.collect() the object is freed.