Splitting a list with certain parameters in Python. Using re.findall -
import re def processfile(filename='names.txt', encode='utf-8'): listofplayers = [] listofinfo = [] count = 0 open(filename, 'r', encoding = encode) f: line in f.readlines(): if count == 0: listofinfo.append(line.strip()) count += 1 elif count == 1: listofinfo.append(line.strip()) listofplayers.append(listofinfo) count -= 1 listofinfo = [] return listofplayers def splitstats(listofplayers): newlist = [] item in (i[1] in listofplayers): m = re.findall('[a-z][a-z]*', item) newlist.append(m) print(newlist) def main(): lop = processfile() splitstats(lop) if __name__ == '__main__': main()
i'm trying @ stats soccer , took stats webpage , trying split each player there position, country, transferred from, transferred to, , money payed them.
my names.txt file looks like:
donyell malen attackernetherlandsarsenalajaxundisclosed petr cech goalkeeperczech rep.arsenalchelsea14million scott sinclair midfielderenglandaston villamanchester city3.4million
my listofplayers processfile has list of lists. player index 0 , rest of information this:
[['donyell malen', 'attackernetherlandsarsenalajaxundisclosed'], ['petr cech', 'goalkeeperczech rep.arsenalchelsea14million'], ['scott sinclair', 'midfielderenglandaston villamanchester city3.4million'],
i'm trying parse through the each item , 1 index split up. found re.findall() method, have searched api hour , still don't have clear picture on how separate capitals (although code there that) need keep 2 words space between 1 string. i.e. "aston villa" should kept together, , how keep there fees i.e. "3.4million" 3.4 million.
i know pretty long question, wanted give overview see if going wrong or if i'm on right track , need re.findall(). thanks!
you use following pattern
"(?:[a-z]|[0-9]+(?:.[0-9]+)?)[a-z]*(?: [a-z][a-z]*)*"
it's pretty complex handles special cases , should dig documentation re module if interested how write such expressions https://docs.python.org/2/library/re.html
Comments
Post a Comment