Sorting |
Putting things into some kind of order is a fundamental technique in Data Processing.
|
We'll be working with the Python function
sorted(). This
function takes a list and returns a new list in sorted order.
Internally, sorted() will compare two elements of a list, and will swap their positions in the list if they are "out of order", namely if the second element is "smaller" than the first. Then it will go on to another pair of elements and compare those, until there are no more elements that are out of order. In comparing two strings, it will compare the first character of each string, and will swap the strings if those characters are out of order. If they're the same character, it will go on to comparing the second character of each string....and so on, until it finds a difference, and uses that to choose the order of the strings. If it runs out of characters in one of the two strings, it will consider the shorter string as smaller. See examples below. In comparing sublists inside of the list to be sorted, it will compare the first element of one sublist to the first element of the other sublist, and try to make a decision on that. If the first elements are the same, then it will look at the 2nd elements, and so on until it finds a difference to make a decision on. If it runs out of elements in one of the two sublists, it will consider the shorter sublist to be smaller. See examples below. |
Examples show
this best: # increasing numerical order: a = [5,3,1,8,-14] b = sorted(a) print(b) [-14, 1, 3, 5, 8] print(a) # note that the list a is unchanged [5, 3, 1, 8, -14] # increasing letter order: c = ['b','f',a','c'] print(sorted(c)) ['a', 'b', 'c', 'f'] # multi-character strings: d = ['hi', 'hit', 'hot', 'ha', 'ha!'] print(sorted(d)) ['ha', 'ha!', 'hi', 'hit', 'hot'] # But be careful. Characters are really compared # according to their ASCII codes. So, all UPPER-CASE # characters are "less" than lower-case characters. e = ['Harry','HARRY','abe'] print(sorted(e)) ['HARRY', 'Harry', 'abe'] # We can also sort lists containing sublists. f = [[5,'Harry'],[5,'Abe'],[2,'Zelda']] print(sorted(f)) [[2, 'Zelda'], [5, 'Abe'], [5, 'Harry']] # Above, it compared the numbers first, and then # when the numbers were equal, it compared the strings |
Examples of sorting tasks: Task 1: Suppose you have a long string of female names, and you want a shorter list of just the unique names in alphabetical order. There's an easy way to do this with another Python datatype called set, but suppose we don't know that. names = 'Mary,Zoe,Ivy,MARY,emma,Steph,EMMA' # create a list name_list = names.split(',') print(name_list) ['Mary','Zoe','Ivy','MARY','emma','Steph','EMMA'] # make the capitalization uniform: for i in range(len(name_list)): name_list[i] = name_list[i].capitalize() print(name_list) ['Mary','Zoe','Ivy','Mary','Emma','Steph','Emma'] # sort it name_sort = sorted(name_list) print(name_sort) ['Emma','Emma','Ivy','Mary','Mary','Steph','Zoe'] # create a new list, eliminating duplicates answer = [name_sort[0]] for i in range(1,len(name_sort)): if name_sort[i] != name_sort[i-1]: answer.append(name_sort[i]) # The answer: print(answer) ['Emma','Ivy','Mary','Steph','Zoe'] Task 2: Suppose you have a long string of female names, and you want them ordered from most frequently duplicated to least frequently duplicated. This is a more involved task. names = 'Mary,Alice,Mary,Emma,Steph,Emma,Emma,Ivy' name_list = names.split(',') name_sort = sorted(names) # the sorted list looks like: print(name_sort) ['Alice','Emma','Emma','Emma','Ivy','Mary','Mary','Steph'] # create a list of little sublists like [2,'Mary'] # and [1,'Ivy'] # with the duplication number followed by the name subs = [[1,name_sort[0]]] for i in range(1,len(name_sort)): if name_sort[i] == subs[-1][1]: subs[-1][0] += 1 else: subs.append([1,name_sort[i]]) # we now have a list that looks like: print(subs) [[1,'Alice'],[3,'Emma'],[1,'Ivy'],[2,'Mary'],[1,'Steph']] # sort it (it will then be in increasing order by # duplication number) subs = sorted(subs) # reverse it, so that it's in decreasing order subs= subs[::-1] # now remove the duplication number so that there's just # a list of names in decreasing order of popularity: answer = [] for asub in subs: answer.append(asub[1]) # Ah...the finished product: print(answer) ['Emma', 'Mary', 'Steph', 'Ivy', 'Alice'] |