python - Finding the most similar list out of a collection of lists, online

Question

Welcome To Ask or Share your Answers For Others

python - Finding the most similar list out of a collection of lists, online

asked Feb 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Finding the most similar list out of a collection of lists, online

I have a collection of lists that each represent a path on a graph:

list1 = [1,2,3,6] # directed edge from 1->2, 2->3, 3->6
list2 = [8,3,5,6]
list3 = [9,1,3,4]
list4 = [7,8,1,4]

I also have the adjacency matrix of the graph.

At each time step I am given an edge: e.g. time step 0: [1,2], time step 1: [3,6], and at each time step I have to find the most similar list, taking into account previous time steps. Meaning, the list that is the most complete.

What is an efficient way to do it?

I tried to use a naive approach, of comparing the incoming edges to every edge in every list, but considering I have a large number of lists, each with a large number of edges, this is too slow.

Update: writing an example input and output at each time step.

time step 0: input [1,2], output: list1

time step 1: input [8,3], output: list1, list2 #equally complete

time step 2: input [3,6], output: list1

Update 2: Thanks to @Nuclearman I coded (maybe naively?) the solution

list1 = [1,2,3,6] # directed edge from 1->2, 2->3, 3->6
list2 = [8,3,5,6]
list3 = [9,1,3,4]
list4 = [7,8,1,4]

lists_dict = {
    'list1' : list1,
    'list2' : list2,
    'list3' : list3,
    'list4' : list4
}


edges = {
    'list1' : len(list1) - 1,
    'list2' : len(list2) - 1,
    'list3' : len(list3) - 1,
    'list4' : len(list4) - 1
}

covered_edges = {
    'list1' : 0,
    'list2' : 0,
    'list3' : 0,
    'list4' : 0
}

completeness = {
    'list1' : covered_edges['list1']/edges['list1'],
    'list2' : covered_edges['list2']/edges['list2'],
    'list3' : covered_edges['list3']/edges['list3'],
    'list4' : covered_edges['list4']/edges['list4']
}

graph = {}

for list_name in lists_dict.keys():
    idx = 0
    
    while idx < len(lists_dict[list_name])-1:
        
        node1 = lists_dict[list_name][idx]
        node2 = lists_dict[list_name][idx+1]

        if node1 in graph.keys(): #if exist
            graph[node1][node2] =  list_name
            
        else: #if doesnt exist
            graph[node1] = {node2: list_name}
        
        idx+=1
        
times= [[1,2],[3,5],[5,6],[8,1],[7,8]]
for time in times:
    edge_in_list = graph[time[0]][time[1]] #list name

    covered_edges[edge_in_list] +=1
    print(covered_edges)
    
    completeness = {
    'list1' : covered_edges['list1']/edges['list1'],
    'list2' : covered_edges['list2']/edges['list2'],
    'list3' : covered_edges['list3']/edges['list3'],
    'list4' : covered_edges['list4']/edges['list4']
    }
    
    mx = max(completeness.values())
    max_list = [k for k, v in completeness.items() if v == mx]
    
    print(max_list)
    print('')

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

[8] uni-app在IOS下使用webview拉起H5支付难道无解了吗？

[9] c - Changing address contained by pointer using function

[10] 三维动画师和程序员的职业前景 thingjs 三维可视化

深蓝 · Answer 1 · 2021-02-06T00:21:13+0000

Try using an adjacency list setup as a nested hash to represent the graphs

IE: the entire example you have could be setup this way (don't recall if this is valid python):

graph = {
  1: {2: [1], 3: [3], 4: [4] },
  2: {3: [1] },
  3: {6: [1], 5: [2], 4: [3] },
  5: {6: [2] },
  7: {8: [4] },
  8: {3: [2], 1: [4] },
  9: {1: [3] },
}

Then you just keep a tally of how many remain in each list and store that into a data structure with O(log N) or better find-min (or find-max just adjust the key), lookup, add item and remove item. You may need to do some math depending on how you define completeness. IE: you may need to store both the total edges and the covered edges and then use [(total - covered) / total, list #] or as the key for the data structure. That way you can quickly find the list even if there are multiple lists with the same completeness. For the result you want, return all lists with the highest completeness.

The above graph let's you quickly determine which list(s) each edge goes to, you then lookup that edge in the remaining counts and decrease the count by one for each list. IE: You can see graph[1][2] is list 1, graph[8][3] is list 2 and graph[3][6] is list 1 as well.

For performance, you may also want to keep a set of already seen edges and skip any already seen edges, though that may or may not be needed, and may or may not be how you want to handle it.

Performance is proportional to the number of lists you need to change, making it output sensitive. If the example provided is anything to go on, the number of list counts you need to update for each incoming edge is likely very small compare to the number of lists. If there are E total edges in all the L lists and you need to process K edges online and those K edges result in processing a total A lists (A is an output sensitive variable and depends on how much overlap is between the lists, the example you gave has zero overlap as each edge has a single list associated with it, but unclear if that would remain with more lists and edges). Then the performance is O(E + K + AlogL) I believe, since those A processed lists each require a log L lookup to find + update the list count. E is the preprocessing required to build the graph. This seems basically optimial unless there is something else. Likely much better than O(K*E) you currently have, unless you have extremely high overlap (A).

Categories

python - Finding the most similar list out of a collection of lists, online

python - Finding the most similar list out of a collection of lists, online

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags