# Top K Frequent Words Problem

0 like 0 dislike
203 views

Given a non-empty list of words, return the k most frequent elements.

Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.

Example 1:

```Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation: "i" and "love" are the two most frequent words.
Note that "i" comes before "love" due to a lower alphabetical order.
```

Example 2:

```Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words,
with the number of occurrence being 4, 3, 2 and 1 respectively.
```

Note:

1. You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
2. Input words contain only lowercase letters.

1. Try to solve it in O(n log k) time and O(n) extra space.

0 like 0 dislike
by Goeduhub's Expert (2.3k points)

Store the frequencies of all words in a hash table and then find the k frequent using a heap. Here we would have to use a custom comparator for heap because of this reason :-

suppose you have : - (1,"abc") and (1,"cdb") . First element is frequency and second is the actual word. Now according to the question if we have k = 1 answer should be (1,"abc"). According to the defualt heap comparison, the smallest tuple would be (1,"abc") and would be removed from the heap but this should be retained and "cdb" should be popped. So we develop a custom comparator so that if two frequencies are equal keep the largest string at the top so that it is the first to be removed.

Time complexity :-

N log k since we never have more than k elements in the heap

from collections import defaultdict

import heapq

class heapItem:

def __init__(self, a, b):

self.a = a

self.b = b

def __lt__(self, item):

if self.a == item.a:

return self.b > item.b

else:

return self.a < item.a

class Solution:

def topKFrequent(self, words: List[str], k: int) -> List[str]:

freq = defaultdict(int)

for word in words:

freq[word] += 1

q = []

for key in freq:

cnt = freq[key]

heapq.heappush(q,heapItem(cnt, key))

if len(q) > k:

heapq.heappop(q)

res = []

while(q):

res.append((heapq.heappop(q)).b)

res.reverse()

return res

eg : ans = Solution()

ans.topKFrequent(["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"],  4)

Output -

["the", "is", "sunny", "day"]

Learn & Improve In-Demand Data Skills Online in this Summer With  These High Quality Courses[Recommended by GOEDUHUB]:-

Best Data Science Online Courses[Lists] on:-

Claim your 10 Days FREE Trial for Pluralsight.