Non Greedy Matching in sed


If you've used sed for a while, you've probably encountered the problem that all your (group) matches are greedy. This is easy to illustrate with an example:

Say you have this text, where you want to convert the java classes surrounded with @-signs into HTML <code/> elements:

The @HashMap@ and @ArrayList@ classes are great!

To convert this into HTML, you'd typically create groups and wrap these two inside <code> elements:

echo 'The @HashMap@ and @ArrayList@ classes are great!' | sed 's#@\(.*\)@#<code>\1</code>#g'

However, sed will always do a greedy search, matching the longest possible match, not quite yielding the result you want:

The <code>HashMap@ and @ArrayList</code> classes are great!

The solution

What you need to do, is to match against everything but the closing delimiter, in our case, the '@'. This is done with the character group notation and the negator ^ in front of the closing delimiter:

@\([^@]*\)@

Now, we get what we want with both java classes wrapped in HTML <code> elements:

echo 'The @HashMap@ and @ArrayList@ classes are great!' | sed 's#@\([^@]*\)@#<code>\1</code>#g'
The <code>HashMap</code> and <code>ArrayList</code> classes are great!

There, another sed mystery solved. Yeah!


Licensed under CC BY Creative Commons License ~ ✉ torstein.k.johansen @ gmail ~ 🐘 @skybert@hachyderm.io ~ 🐦 @torsteinkrause