./MyRants
  • Home
  • Python
  • GetAways!

Xy Problem

1/8/2013

0 Comments

 
The Xy problem  is a very classic case of user/client asking for a problem X, but intending to solve problem Y and finally lands up sapping everyone. So, this is how it goes
    • User wants to do X.
    • User doesn't know how to do X, but thinks they can fumble their way to a solution if they can just manage to do Y.
    • User doesn't know how to do Y either.
    • User asks for help with Y.
    • Others try to help user with Y, but are confused because Y seems like a strange problem to want to solve.
    • After much interaction and wasted time, it finally becomes clear that the user really wants help with X, and that Y wasn't even a suitable substitute for X.

Heres an example I encountered. 

Problem Title:    Pattern Matching and Regular Expression
The input file is in the following format and the data is to be crunched.

#
# <Title>
#
[Space]
[
Few lines of information about the title
]

[Space]

#
# <Title>
#
[Space]
[
Few lines of information about the title
]
The title is soo obtuse . This is a simple case of structuring unstructured data. Anyways after a lot of hocth-poch I figured out what was really required and started brainstorming on it.

Whenever it comes to python wrt to regex I kinda hesitate to use regex in them , infact I try as possible to avoid it. Why ? 
I am bad @ regex . I tend to miss out edge cases and then get screwed.  Secondly , I have happen to see some deadly Perl regex and they can crunch data way faster than what python could do. I dont intend to start a language debate, but most of the times i have seen perl regex do much better than pythons. You could compile the regex and then use it in python for better performance. But still, this is one area where  Perl dominates.

The initial solution I saw for the above problem was to fetch the whole file content into a string and compile it using regex. Seems like a fair deal , but here is the problem

  1. AFAIK str is immutable data in python . A copy of it is made in memory before changing it. So, if the file size if tooooooo BIG it may not fit into memory.
  2. REgex , (Yeah I hate it! )


I prefer reading the file line by line and then structring it because , all we needed to extract was title and its relevant context. My main concern was to avoid loading the whole file at once and keep regex to minimum.
I agree , the code in not exactly pythonic.  It can  be made more pythonic , but that exercise i left to OP ;)
0 Comments



Leave a Reply.

    Picture
    Author
    Finally awakened from my comatose. When inspiration gives some insight, I write! . Victim of the Next Big Thing Syndrome & Pre-Optimization.

    Archives

    November 2013
    June 2013
    January 2013

    Categories

    All
    Cpp
    Memory
    Python

    RSS Feed

Powered by Create your own unique website with customizable templates.