Sunday, January 8, 2012

Deleting all JS comments from JSP page at build time: Part 1


Deleting JS comments from JSP pages

I have been asked to delete all the JS comments from JSP page. The task is tedious and daunting. Deleting even a single unexpected character can make the application unstable. If I make a 2-3 year old stable system, unstable, I am fired. And If I have to sit and delete all the comments manually, I will set myself on fire. Seriously..

So this is the time to do some brainstorming to avoid above two situations. I started with regex to find the script block in JSP. After 1-2 hours of struggle, I have found out regex

<script[\\S\\s]*?>    - This regex will find all the opening script tag.
<script[\\S\\s]*?((/>)|(</script>))   - This will find all the script block which include external js file.
</script  - This regex will find all the end script tag. (Attention: no closing angle bracket at the end. This is to avoid </script --%>. Yes, some developer can comment the entire script block with jsp comments.)

No I have 3 tools to strip off all the script blocks from JSP. I am doing this to make sure that I am deleting only JS comments and nothing else.

Now second step is to delete all multi-line comments first. Why multi-line comments first, why not inline comments…
After 2 hours of testing multi-line first or inline first, I have the answer, what would happen if I am deleting inline comments first and I hit something like this


<script type="text/javascript">
/*
            document.write("<h1>This is a heading</h1>");
            document.write("<p>This is a paragraph.</p>");
            document.write("<p>This is another paragraph.</p>");
//*********************
//*/
</script>

Second and third comments will get deleted and next when I will be deleting multi-line, entire multi-line comment will not get detected. And I am screwed.

So, I decided to delete the multi-line comments first. What I am trying to do is search a pattern inside a matched pattern. If you know how to do this, please let me know..

I have tried some regex, here are my findings

This regex
(<script([\S\s]*?)>)(([\S\s]*?)([^:'\-"]//.*[^'"]$)([\S\s]*?))(</script>)

is keeping only first comments in backreference, hence only first comment in script block can be deleted.
And this regex

(<script([\S\s]*?)>)(([\S\s]*?)([^:'\-"]//.*[^'"]$)([\S\s]*?))*(</script>)

only keeps the last comments in backreference, hence this will not work too.

I have to delete all the comments in script block. I decided to write a program to strip off all the script blocks from the JSP and then delete all the multi-line comments first and then all inline comments.

Writing a java program from scratch and establishing all the infra needs like logging, file and directory handling is wasting of time, focus on main task. Yes I will write the task only: ANT Task.
All plan set, started to write a ANT task to delete all the JS comments from JSP files.

I will discuss the ANT task in my next post.

No comments:

Post a Comment