The Xy problem is a very classic case of user/client asking for a problem X, but intending to solve problem Y and finally lands up sapping everyone. So, this is how it goes
Heres an example I encountered.
Problem Title: Pattern Matching and Regular Expression
- User wants to do X.
- User doesn't know how to do X, but thinks they can fumble their way to a solution if they can just manage to do Y.
- User doesn't know how to do Y either.
- User asks for help with Y.
- Others try to help user with Y, but are confused because Y seems like a strange problem to want to solve.
- After much interaction and wasted time, it finally becomes clear that the user really wants help with X, and that Y wasn't even a suitable substitute for X.
Heres an example I encountered.
Problem Title: Pattern Matching and Regular Expression
The input file is in the following format and the data is to be crunched.
#
# <Title>
#
[Space]
[
Few lines of information about the title
]
[Space]
#
# <Title>
#
[Space]
[
Few lines of information about the title
]
The title is soo obtuse . This is a simple case of structuring unstructured data. Anyways after a lot of hocth-poch I figured out what was really required and started brainstorming on it.
Whenever it comes to python wrt to regex I kinda hesitate to use regex in them , infact I try as possible to avoid it. Why ?
I am bad @ regex . I tend to miss out edge cases and then get screwed. Secondly , I have happen to see some deadly Perl regex and they can crunch data way faster than what python could do. I dont intend to start a language debate, but most of the times i have seen perl regex do much better than pythons. You could compile the regex and then use it in python for better performance. But still, this is one area where Perl dominates.
The initial solution I saw for the above problem was to fetch the whole file content into a string and compile it using regex. Seems like a fair deal , but here is the problem
I prefer reading the file line by line and then structring it because , all we needed to extract was title and its relevant context. My main concern was to avoid loading the whole file at once and keep regex to minimum.
Whenever it comes to python wrt to regex I kinda hesitate to use regex in them , infact I try as possible to avoid it. Why ?
I am bad @ regex . I tend to miss out edge cases and then get screwed. Secondly , I have happen to see some deadly Perl regex and they can crunch data way faster than what python could do. I dont intend to start a language debate, but most of the times i have seen perl regex do much better than pythons. You could compile the regex and then use it in python for better performance. But still, this is one area where Perl dominates.
The initial solution I saw for the above problem was to fetch the whole file content into a string and compile it using regex. Seems like a fair deal , but here is the problem
- AFAIK str is immutable data in python . A copy of it is made in memory before changing it. So, if the file size if tooooooo BIG it may not fit into memory.
- REgex , (Yeah I hate it! )
I prefer reading the file line by line and then structring it because , all we needed to extract was title and its relevant context. My main concern was to avoid loading the whole file at once and keep regex to minimum.
I agree , the code in not exactly pythonic. It can be made more pythonic , but that exercise i left to OP ;)