Pandas排序_前嗅大数据

Pandas排序

时间: 2020-03-28来源：OSCHINA

前景提要

Pandas有两种排序方式，它们分别是 - 按标签按实际值
按标签排序
使用sort_index()方法，通过传递axis参数和排序顺序，可以对Series， DataFrame进行排序。默认情况下，按照升序对行标签进行排序。
实例： df = pd.Series(['E','B','C']) print(df.sort_index(axis=0,ascending=False)) unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1']) #按行排列 print (unsorted_df.sort_index(axis=0, ascending=False)) # 按列排列 print (unsorted_df.sort_index(axis=1, ascending=False))
输出： 2 C 1 B 0 E dtype: object col2 col1 9 -0.680375 0.450634 8 0.354761 -0.919791 7 0.539276 -0.416008 6 -0.067286 0.513746 5 -0.191821 -1.265648 4 -1.075135 0.717537 3 -0.436641 0.007743 2 1.002102 -1.133920 1 -0.193714 0.664201 0 -0.495355 -0.727960 col2 col1 1 -0.193714 0.664201 4 -1.075135 0.717537 6 -0.067286 0.513746 2 1.002102 -1.133920 3 -0.436641 0.007743 5 -0.191821 -1.265648 9 -0.680375 0.450634 8 0.354761 -0.919791 0 -0.495355 -0.727960 7 0.539276 -0.416008
按值排序
像索引排序一样，sort_values()是按值排序的方法。它接受一个by参数，它将使用要与其排序值的DataFrame的列名称。
实例： df = pd.Series(['E','B','C']) print(df.sort_values(axis=0,ascending=False)) unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]}) print (unsorted_df.sort_values(by=['col1', 'col2'], ascending=False))
输出： 0 E 2 C 1 B dtype: object col1 col2 0 2 1 3 1 4 1 1 3 2 1 2
排序算法
sort_values()提供了从mergeesort，heapsort和quicksort中选择算法的一个配置。Mergesort是唯一稳定的算法
方法时间工作空间稳定性速度
'quicksort'（快速排序） 'mergesort'（归并排序） 'heapsort'（堆排序）
O(n^2) O(n*log(n)) O(n*log(n))
0 ~n/2 0
否是否
1 2 3

实例： import pandas as pd import numpy as np unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]}) sorted_df = unsorted_df.sort_values(by='col1' ,kind='mergesort') print (sorted_df)
执行上面示例代码，得到以下结果 - col1 col2 1 1 3 2 1 2 3 1 4 0 2 1