python - Why doesn't this regular expression work in all cases? -


I have a text file that contains such entries:

  @MarkWarrenVirginia - Mark Warner @ Santor is Vermont - Patrick Lehi no Senators Anders Vermont - Bernie Sanders @OrinhHackUtaf-OrinHatchNot @GameDemandSouthCarolin- Jim Dement not @SenmeicleUtah - Mike Lee @KibaiHatchTaxas - Hutchison @JohnCorninTaxas - John Corninn @ S Nolcarder Tennessee - Lamar Alexander   

I wrote the following to remove 'no' and 'dash' using regular expressions:

  Importing Leaders = Open ('testfile .txt') text = politicians.read () # get 'no' vote # should be 11 entries regex = re.compile (r '(no \ s @ [\ w + \ d + \. ] * \ S \ w + \ S? \ W + \? S? \ W \ \ \ w \ \?? \ W \), re.I) no = regex.findall (text) ## To list Make a string neurist = '' .join (not) ## replace the dashes in space with a space Deldash = re.compile ('\ s - * \ s') a = deldash.sub ('', newlist) # Delete string in delno = re.compile 'no' ('no \ s') b = delno.sub ('', A) # String in the list # Problem with jimdemint # South Carolinaol Jim Dement regex2 = re.compile (r '(@ [\ w \ d \.] * \ S [\ w \ d.] \ S? [\ W \ d \.] \ S? [\ W \ d \. *] \ S +? \ W +) ', re.I) search for i in lst1 = regex2 Lst1: (ii) print i   

When I run the code, Twitter handles state and full names. I have said that I want to ignore the issue of regex.

Any thoughts? Why is this expression not occupying this nickname?

This is missing from it because its state name has two words: South Carolina

Be the second regx, it should help

  (@ [\ w \ d \.] * \ S [\ w \ d \.] * \ S? [\ W \ d \.] [\ W \ d \.] *? \ S +? \ W + (?: \ S \ w +)?)   

I added

  (?: \ S \ w +)?   

An optional, non-capturing group that matches one location after one or more alphanumeric underscore characters

This indicates that it does not properly and Matches the input with the dash

Edit: If you want a Master Regs, then after you delete the number and dash, everything is captured and split properly ,

  ((@ [\ w])? ((: (: [\ W] +) \ s?) {1,2}) ((? [\ W ] \ S) {2}))   

You are here

The full match is available in $ 1, the Twitter handle in $ 2, the state in $ 3 and the name in $ 4

each capturing group works as follows: / P>

  (@ [\ w] +? \ S)   

this corresponds to a @ sign after which it is possible to have at least one In the form of some characters in the form of space.

  (?: (?: [\ W] +?) \ S) {1,2})   

This match and Capture 1 or 2 The word, which should be the state. It only works because of the next piece, which are two essential words

  ((?: [\ W] +? \ S) {2})   

matches and occupies the exact two words, which is probably defined as some letters,

Comments