RegExp regular expression find & replace whole words only

Jacob D Source

I should preface this by stating that I'm working with VB6 & RegExp

I'm attempting to find and substitute whole words, by "whole words" I mean a valid match is not a substring of another word, although some special characters will be ok. I'm a novice at regular expressions. This is what I was trying:

([^a-z]+)(Foo)([^a-z]+)

It seems close but I'm having some trouble in certain situations.

For example, if I find the string

Foo Foo

or

Foo(Foo)

or anywhere a line ends with Foo and the following line begins with Foo

This is a line with Foo
Foo starts the next line

In any of these cases only the first Foo is matched.

Well, maybe it isn't a problem with the match but rather my replace method. I don't know exactly how I can verify that. I'm using groups to replace whatever bounding char is matched by the expression, like so:

regEX.Replace(source, "$1" & newstring & "$3")

So in summary I want to avoid matching: FooBar BarFoo

Any of the following would be valid matches:

Foo Foo
Foo Bar
Foo_bar
Foo.bar
Foo, bar
Foo(bar)
Foo(Foo)

If anyone can kindly show me the proper way to do this I would much appreciate it!

edited

Looks like I spoke a little too soon regarding the first solution below. After a little testing and some further reading, I see that underscore is a word char and thus the above pattern won't match it. I came up with this which does the trick, is there a better way?

(\b)(Foo)(\b|_)

regEX.Replace(source, "$1" & newstring & "$3")

It works, but seems a little sloppy.

regex

Answers

answered 6 years ago Bohemian #1

Use the "word boundary" expression \b.

Perhaps something as simple as this will do:

(.*)\bFoo\b(.*)

FYI, the word boundary expression \b is a zero-width match between a word character \w and a non-word character [^\w] or visa versa, and consumes no input.


Underscore and digit characters are considered "word characters", so Foo_Bar, Bar_Foo, and Foo123 wouldn't match. To rectify that, so that any non-letter is considered "end of word" (including start and end of input), use look arounds:

(?i)(.*(?<![^a-z]))Foo((?![^a-z]).*)

comments powered by Disqus