c# regex-match backslashes in strings

My suggestion — First find a “safe” character that’s guaranteed not to show up in the original string, like “_”. Replace all back slashes. Then proceed.

Problem with backslashes is the unnecessary complications. Here I want to match “one-or-more backslashes”. In the end I need to put 4 bachslashes in the pattern to represent that “one”.

var ret = Regex.Replace(@”any number of\\\backslashes”, “(.+\\\\+)?(.+)”, “$1 – $2”);

Alternatively, I could use @ to reduce the complexity @”(.+\\+)?(.+)”

Disappointingly the @ does a partial job. We still need 2 strokes — Confusing! I’d rather just remember one simple rule and avoid the @ altogether


java regex — replace with captured substring but modified

Any time you have a string with lots of x.xx000001 or x.xx99999, it’s probably noise you want to get rid of. Here’s a java solution. Perl can do this in 1 line (at most 2).

    public static String cleanUp999or000(String orig) {
        final static Pattern PATTERN9999 = Pattern
        Matcher m = PATTERN9999.matcher(orig);
        StringBuffer sb = new StringBuffer();
        String without999 = orig;
        String without999_or_000 = orig;
        try {
            while (m.find()) {
                final long intEndingIn999 = Long.parseLong(m.group(2));
                final long intEndingIn000 = intEndingIn999 + 1;
                m.appendReplacement(sb, m.group(1) + intEndingIn000 + m.group(3));
            without999 = sb.toString();
        } catch (NumberFormatException e) {
            without999 = orig;
        } finally {
            without999_or_000 = without999.replaceAll(
                “(\\d\\.\\d+?)0000+\\d(\\s)”, “$1$2”);
        return without999_or_000;

scanning a multi-line string in java String.matches()

By default, the dot doesn’t match line terminators.

You can modify that behavior with the DOTALL flag, but not supported in String.matches()


lookaround assertions across languages

/(?<!->)\bparentDataStore/ ## is a perl regex with a negative lookbehind. It disqualifies "->parentDataStore"

(class|struct)\s+MyClass\b(?!;) ## is a perl regex for a class definition. The trailing negative lookahead (?!;) ensures we don’t match a forward class declaration.
—-Below is a Nov 2010 java example:

replaceAll("(?<=</?)MTSMessage", "SIG_Notification")

The optional positive lookbehind assertion above says to match (and replace) the “MTSMessage” string provided it’s preceded by “<” or “</”

LookAhead is simpler than LookBehind — Compare the syntax. Some languages only support lookAhead.

I feel negative lookAhead is more useful than positive lookAhead. I feel these zero-width assertions are useful in progressive matches, but I seldom need complex progressive.

If you use lookaround you may want to start with sample code and make incremental changes. Plausible but incorrect lookaround patterns abound. This is a time you need to understand how regex engines work.

* explains how (not) to capture a back-reference in a lookAhead.


group() – progressive match`]java

google iview 1st whiteboard cod` question
public class Main {
static final int size=20;
static Pattern p = Pattern.compile("[a-zA-Z]+");
static Matcher m;
static StringBuffer c; // candidate without extra space
static String good;
public static void main(String[] args) {
test("we all love apples more");
static void test(String s){
m = p.matcher(s);
c = new StringBuffer();
if (c.length() >size) break;
good = c.toString();
c.append(" ");
System.out.println(good + "____");


matching a stuck-record^nopainnogain

a stuck-record is something like “i love fish i love fish i love fish “.

Q: How do u match a stuck record?
A: /(i love fish )\1+/ seems to be the standard solution in literature

On the other hand, “no pain no gain” are not stuck-records but a …. excel column list — AB-AC-AD-…?

Q: Will \d{3} match 287 ?
A: Yes. see p 177 [[ programming perl ]]. Looks like \d behaves like a wildcard just like the dot

Q: will (\d\d){2} match 7193 ?
a: yes


j regex to remove trailing alphabets (from portNum)

        Pattern p = Pattern.compile(“[a-zA-Z]*$”);
        Matcher m = p.matcher(this.portNum);
                “au-” +
                opp.cardSlot + “-” +
                opp.portNum + “-” +               
                this.ontSlot + “-” + 

regex can’t parse xml/html

This is One reason I can understand.

The “context” /dictates/ which rules to apply.

The structure of the document is not understood or taken into account.

Both DOM and SAX know the context and structure.