Splitting a list with certain parameters in Python. Using re.findall -


import re  def processfile(filename='names.txt', encode='utf-8'):     listofplayers = []     listofinfo = []     count = 0     open(filename, 'r', encoding = encode) f:         line in f.readlines():             if count == 0:                 listofinfo.append(line.strip())                 count += 1             elif count == 1:                 listofinfo.append(line.strip())                 listofplayers.append(listofinfo)                 count -= 1                 listofinfo = []     return listofplayers  def splitstats(listofplayers):     newlist = []     item in (i[1] in listofplayers):         m = re.findall('[a-z][a-z]*', item)         newlist.append(m)     print(newlist)      def main():     lop = processfile()     splitstats(lop)  if __name__ == '__main__':     main() 

i'm trying @ stats soccer , took stats webpage , trying split each player there position, country, transferred from, transferred to, , money payed them.

my names.txt file looks like:

donyell malen attackernetherlandsarsenalajaxundisclosed petr cech goalkeeperczech rep.arsenalchelsea14million scott sinclair midfielderenglandaston villamanchester city3.4million 

my listofplayers processfile has list of lists. player index 0 , rest of information this:

[['donyell malen', 'attackernetherlandsarsenalajaxundisclosed'], ['petr cech', 'goalkeeperczech rep.arsenalchelsea14million'], ['scott sinclair', 'midfielderenglandaston villamanchester city3.4million'], 

i'm trying parse through the each item , 1 index split up. found re.findall() method, have searched api hour , still don't have clear picture on how separate capitals (although code there that) need keep 2 words space between 1 string. i.e. "aston villa" should kept together, , how keep there fees i.e. "3.4million" 3.4 million.

i know pretty long question, wanted give overview see if going wrong or if i'm on right track , need re.findall(). thanks!

you use following pattern

"(?:[a-z]|[0-9]+(?:.[0-9]+)?)[a-z]*(?: [a-z][a-z]*)*" 

it's pretty complex handles special cases , should dig documentation re module if interested how write such expressions https://docs.python.org/2/library/re.html


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -