If you’re a developer, the term “regular expression” should at least sound familiar. But, if you’ve never had to write any code that made use of them (i.e. code in a scripting language), you may not know how to write/use regular expressions. And, if you’re like I was, that doesn’t bother you. After all, regular expressions can get confusing and complicated very quickly. So, it’s probably better to just leave them alone until you need them for some code, right?
Sure, maybe. But consider this: How many times have you had to do a tedious, repetitive series of find/replace operations with various parameters on a huge block of code with obvious patterns? Were you aware that Visual Studio supports regular expressions in Find/Replace operations? In fact, lots of IDEs and beefed-up text editors support regular expressions. As an example, take a look at the following block of code:
class __declspec(novtable) ITimelineStateAccess
{
public:
/// Create the Timeline state
/// Returns: true if creation succeeded
/// false if the state already exists
virtual bool CreateTimelineState( Timeline* timeline ) = 0;
/// Retireves state of timeline by ptr of timeline
virtual const TimelineState* GetTimelineState( Timeline* timeline ) = 0;
/// Returns true if the start frame was successfully set, false otherwise
virtual bool SetSelectedRangeStartFrame( Timeline* timeline,
CSMediaLib::CSTime frame ) = 0;
/// Returns true if the end frame was successfully set, false otherwise
virtual bool SetSelectedRangeEndFrame( Timeline* timeline,
CSMediaLib::CSTime frame ) = 0;
/// Sets the timeline state observer
virtual bool SetTimelineStateObserver( Timeline* timeline,
IObjectStateObserver* observer ) = 0;
/// Sets a flag telling whether a media is selected or not
virtual void SetIsMediaSelected( ObjectId mediaId, bool selected ) = 0;
/// Clears the currently selected media
virtual void ClearSelectedMedia() = 0;
/// Gets all media ids at a given time from all unlocked tracks
virtual std::list GetMediaIdsAtTime( CSMediaLib::CSTime time ) = 0;
};
Now, let’s say I wanted to make all the functions non-virtual. Well, that’s a simple find/replace. Simply replace “virtual ” and ” = 0″ with “”.
Now, what if I wanted to swap the parameters on all the functions with two parameters ( for the sake of emphasis, imagine there are more than a few of said type of function )? Well if they all take the same parameter types, with the same names, then it’s another simple find/replace. But if they all take different types, with different names, then I’ve got a separate find/replace for each of them. At this point, I might as well do it by hand.
Alternatively, I could use a regular expression to match the specific type of text I’m looking for. The following regular expression will do the trick:
\(.+,\n@.+\)
Since I’m assuming some readers don’t know how to use regular expressions, I’ll explain what’s happening. Here’s a slightly separated view, to give you an idea of what the individual parts are:
\( Â .+ Â , Â \n@ Â .+ Â \)
\( Â and \) Â match the left and right parentheses ( they are operators in regular expressions, so they have to be escaped with a backslash ).
The period (.) matches any character except a line break. Adding a + sign after it means there must be one or more of them. Also, the + operator is “maximal”, meaning it will try to match as many characters as it can, provided the rest of the regular expression will still match.
The comma is the most literal part of the expression, denoting that we expect exactly one comma at this point in the match.
\n is matching a line break. The @ operator causes a match if there are zero or more of the preceding item. So we expect to find zero or more line breaks after the comma. This helps us match function prototypes that are broken up across two lines for readability. The @ operator is “minimal”, so it will try to match as few line breaks as it can. This is a good thing because we only want to match one.
If you put all of this together, you can see how that expression will match all of the two-parameter function prototypes in the above code. Now, how about swapping those parameters?
This is where we get into “tagged expressions”. This is a feature I know is supported by Visual Studio; other editors likely have a similar idea. A tagged expression sets aside a part of the match for later use. In Visual Studio, you tag an expression by surrounding it with { }. Later, you can insert that expression by numerical reference. Let’s tag some things in our first example:
\({.+}{,\n@}{.+}\)
Now we have the following tagged expressions:
1. Whatever was matched by the first .+
2. The comma and line break (if any)
3. Whatever was matched by the second .+
You can use tagged expressions within the Find and Replace boxes in the Visual Studio Find/Replace dialog by inserting a \#, where # is the number representing which tagged expression you want. In the Replace box, you can also use \0 to insert the entire matched result of the regular expression. So to swap our parameters, we would use the following as our replace expression:
( \3\2\1 )
Notice that we don’t have to escape the parentheses in a Replace expression. In fact, most operators don’t apply in Replace expressions. So if you try to use them, they will be inserted as-is. Also, if you’re clever, you may have noticed that the second .+ will include all the white space leading up to the second parameter. So when that gets inserted in place of the first parameter, it will really mess up the formatting. But that’s a trivial detail that could be fixed with a minor change to the expression. I’ll leave that as an exercise.
As you can see, regular expressions can be very powerful and can save you a lot of time. But, there can be downsides. If it takes you longer to figure out the regular expression for one single find/replace than it does to do the multiple repetitive version, it’s obviously not worth it. They can also be tricky depending on what you are trying to match. Sometimes they may do things you don’t expect, like match the whole document because the expression wasn’t strict enough. Judicious use of minimal/maximal operators can help out with that. Of course, as with anything, regular expressions get much easier to write with experience. And then there’s the fact that different editors will often use different operators, so you may have to relearn little bits and pieces to cope with the fact that they aren’t standardized( This page will help with Visual Studio’s Find/Replace dialog ).
As a final example, see if you can figure out the Find/Replace expressions to turn the following interface definition into the (mostly done) mock definition listed below it, excluding the class declaration part:
interface __declspec(novtable) IMediaFrame
{
public:
virtual void Initialize( LONG width, LONG height ) = 0;
virtual BOOL IsInitialized() = 0;
virtual void Uninitialize() = 0;
virtual LONG Width() = 0;
virtual LONG Height() = 0;
virtual double Opacity() = 0;
virtual void SetOpacity( double opacity ) = 0;
virtual void SetTransform( const CSMediaLib::TransformMatrix t ) = 0;
virtual void ApplyTransform( const CSMediaLib::TransformMatrix t ) = 0;
virtual const CSMediaLib::TransformMatrix GetTransform() = 0;
virtual BOOL IsDirty() = 0;
virtual void ClearDirty() = 0;
};
class gmock_IMediaFrame : public IMediaFrame
{
public:
MOCK_METHOD0( Initialize, void ( LONG width, LONG height ));
MOCK_METHOD0( IsInitialized, BOOL ( ));
MOCK_METHOD0( Uninitialize, void ( ));
MOCK_METHOD0( Width, LONG ( ));
MOCK_METHOD0( Height, LONG ( ));
MOCK_METHOD0( Opacity, double ( ));
MOCK_METHOD0( SetOpacity, void ( double opacity ));
MOCK_METHOD0( SetTransform, void ( const CSMediaLib::TransformMatrix t ));
MOCK_METHOD0( ApplyTransform, void ( const CSMediaLib::TransformMatrix t ));
MOCK_METHOD0( GetTransform, const CSMediaLib::TransformMatrix ( ));
MOCK_METHOD0( IsDirty, BOOL ( ));
MOCK_METHOD0( ClearDirty, void ( ));
};
If you don’t want to do it, here’s the answer.
I know that I’ve often used Regular Expressions (alright, fine; “regexes”) to avoid a bunch of typing and repetition. I hope others will find them useful as well.
Randy Schott graduated from Michigan State University in 2007 with a Bachelor’s degree in Computer Science and Engineering. Since then, he has been a software engineer at TechSmith, working on the Camtasia Studio team. His development interests center mostly around multimedia, with a soft spot for audio and DSP. Outside of the office, he is an avid musician in multiple genres and occasionally dabbles in gaming.
