Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free.

EDIT: Can anyone help me out with a regular expression for a string such as this?:

[Header 1], [Head,er 2], Header 3

so that I can split this into chunks like:

[Header 1]
[Head,er 2]
Header 3

I have gotten as far as this:

(?<=,|^).*?(?=,|$)

Which will give me:

[Header 1]
[Head
,er 2]
Header 3

share|improve this question
    
How man CSV implementations does the world need??? –  Joachim Sauer Apr 8 '09 at 21:55
    
Is this a homework question? Because I find it simpler to just use plain old manipulation - basically: for each char: if char is comma and not inside a bracket then add current string to list –  Lucas Jones Apr 8 '09 at 22:11

6 Answers 6

up vote 9 down vote accepted

In this case it's easier to split on the delimiters (commas) than to match the tokens (or chunks). Identifying the commas that are delimiters takes a relatively simple lookahead:

,(?=[^\]]*(?:\[|$))

Each time you find a comma, you do a lookahead for one of three things. If you find a closing square bracket first, the comma is inside a pair of brackets, so it's not a delimiter. If you find an opening bracket or the end of the line/string, it's a delimiter.

share|improve this answer
    
Ah I see, I can replace the commas with another special char and split accurately using that. That'll work for me! Thanks! –  Nate Apr 15 '09 at 19:08
    
This works perfect as long as there are no nested brackets. For example, works as expected for [a],[b],[c[d,e]] but fails in [a],[b],[c,[d,e]]. Matches the comma next to c in the last example. How can this be improved so it does not match that also? –  matte Jul 9 '12 at 14:38
    
Actually, to be more precise for [a],[b,[] it matches the comma after b. If there is any opening square bracket in [], this pattern matches the comma in the brackets. –  matte Jul 9 '12 at 14:53
    
I think if brackets can be nested, split is no longer an option; you would have to match the tokens. And if they can be nested more than one level deep, regex might not be an option at all. (Many flavors can handle nesting to arbitrary depth, but it's ugly as hell.) –  Alan Moore Jul 10 '12 at 0:51
    
i have quick question..why "this is me".split(/(\s)/); is different than "this is me".split(/\s/);. It's only in split not in .match for example. JS. –  Muhammad Umer Aug 31 '13 at 19:58

You could either use a regular expression to match the values inside the brackets:

\[[^\]*]\]

Or you use this regular expression to split the bracket list (using look-around assertions):

(?<=]|^)\s*,\s*(?=\[|$)
share|improve this answer
\[.*?\]

Forget the commas, you don't care about them. :)

share|improve this answer
    
Good answer, but he changed the question on you... –  dmckee Apr 8 '09 at 22:46
    
Well, now I'm confused. Does it really say Header or is that some placeholder? Are the brackets really there or optional? It has now become confusing exactly what the valid input strings are. –  JP Alioto Apr 8 '09 at 23:56
    
Sorry about changing it, Valid input strings are [Some Text], Some More Text, [Yet mo,re Text] ...split into [Some Text] / Some more Text / [Yet mo,re Text] –  Nate Apr 9 '09 at 15:33
 (?<=,|^)\s*\[[^]]*\]\s*(?=,|$)

use the [ and ] delimiters to your advantage

share|improve this answer

Isn't it as simple as this?

(?<=,|^)(?:[^,]|\[[^[]*\])*
share|improve this answer
    
When I use your regex, I get the following form the dev tools: regex = /(?<=,|^)(?:[^,]|\[[^[]*\])*/ SyntaxError: Invalid regular expression: /(?<=,|^)(?:[^,]|\[[^[]*\])*/: Invalid group –  starbeamrainbowlabs Jan 4 '13 at 18:47

Variations of this question have been discussed before.

For instance:

Short answer: Regular Expressions are probably not the right tool for this. Write a proper parser. A FSM implementation is easy.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.