primitive regex matcher#Facebook

Q1: write a stateless utility function to take in any string consisting of a-z, and return a string with all the original characters appearing once only. All 2nd and subsequent occurrences should be removed. eg: abcbdc → abcd

They don’t mind c/c++/java, but I feel they prefer c-string.

I don’t want to new up a string in my utility, so I decided to take in an output char* buffer which should be large enough.

Q2: Suppose there’s a regex grammar that can accept exactly 3 constructs
1) literals — any combination of abcdefghijkljmnopqrstuvwxyz,
2) dot (.) that matches any one character
3) asterisk — a standard quantifier

For example,

“a*” can match empty string, a single a, or aa, or aaa etc.

“.*” matches any string including the empty string.

“a*b*a” should match “a” and also match “aa”, or “ba”, or “aaa”
Now, write a utility function bool match(char const * regex, char const * S);

Note entire string S must match entire regex. We don’t want to get a false positive when only a substring of S matches the regex.
—– Here are some tips —–
Easy part — if regex ends in literals, then they must “eat up” the tail of S exactly. After this step, if regex ends with a dot, it will just eat up the last char of S. After this step, regex definitely ends in an asterisk.

Now assume dot is not allowed in regex. We will deal with the dot later.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s