Python regex remove all character except 4 digitals -
+1511 0716 +4915 czechy +3815/0616 port mo, ao _3615 usa *, suv run on flat +4515 port suv *, suv +3215 usa *, suv +4414 +4815 niem _0616 niem * / mo +2115 niem j
i need first 4 digits
+3715 niem
please help.
you haven't described data well, looks have 2 types of lines:
(one or 0 characters)(four digits)(other stuff)
or
(other stuff no set of 4 digits)
i propose using re
package. here documentation module in python 3, should read able solve these problems on own in future.
i'll assume have lines in list (or other iterable) named lines
:
import re regex = re.compile(r'^.?([0-9]{4})') line in lines: match = regex.match(line) if match: number = match.group(0) # stuff `number`, string.
this assumes there ever 1 character ahead of four-digit number, , don't care whatever comes afterward.
if wanted any first 4 digits appear (with number of characters in advance), instead use regex = re.compile(r'[0-9]{4}')
.
how regex works
the first regex ^.?([0-9]{4}
. i'll break down you, because i'm guessing you're new regexes.
^
matches beginning of line.
matches any character once?
says make previous match either 0 or 1 times- so
.?
says "give me @ 1 character don't know"
- so
()
parentheses used grouping, tells regex engine "do match, let me access these things on own"[]
specifying class of characters; engine match 1 character inside brackets[0-9]
character class digits:-
matches inclusive in ascii ordering (i believe)
{n}
specifies repeat previous thingn
times[0-9]{4}
says "give me 4 digits"
when put ^.?([0-9]{4})
, we're saying regex engine: "give me string starts @ beginning of line , might have character @ beginning, , has 4 digits afterwards. care digits, though, let me access directly."
Comments
Post a Comment