Top K Frequent Words Problem

asked Sep 11, 2020 in Coding Questions by Vaibhav98 Goeduhub's Expert (2.3k points)

Given a non-empty list of words, return the k most frequent elements.

Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.

Example 1:

Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation: "i" and "love" are the two most frequent words.
    Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:

Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words,
    with the number of occurrence being 4, 3, 2 and 1 respectively.

Note:

You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
Input words contain only lowercase letters.

Follow up:

Try to solve it in O(n log k) time and O(n) extra space.

1 Answer

answered Sep 12, 2020 by Vaibhav98 Goeduhub's Expert (2.3k points)

Best answer

Store the frequencies of all words in a hash table and then find the k frequent using a heap. Here we would have to use a custom comparator for heap because of this reason :-

suppose you have : - (1,"abc") and (1,"cdb") . First element is frequency and second is the actual word. Now according to the question if we have k = 1 answer should be (1,"abc"). According to the defualt heap comparison, the smallest tuple would be (1,"abc") and would be removed from the heap but this should be retained and "cdb" should be popped. So we develop a custom comparator so that if two frequencies are equal keep the largest string at the top so that it is the first to be removed.

Time complexity :-

N log k since we never have more than k elements in the heap

from collections import defaultdict

import heapq

class heapItem:

def __init__(self, a, b):

self.a = a

self.b = b

def __lt__(self, item):

if self.a == item.a:

return self.b > item.b

else:

return self.a < item.a

class Solution:

def topKFrequent(self, words: List[str], k: int) -> List[str]:

freq = defaultdict(int)

for word in words:

freq[word] += 1

q = []

for key in freq:

cnt = freq[key]

heapq.heappush(q,heapItem(cnt, key))

if len(q) > k:

heapq.heappop(q)

res = []

while(q):

res.append((heapq.heappop(q)).b)

res.reverse()

return res

eg : ans = Solution()

ans.topKFrequent(["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], 4)

Output -

["the", "is", "sunny", "day"]

Learn & Improve In-Demand Data Skills Online in this Summer With These High Quality Courses[Recommended by GOEDUHUB]:-

Best Data Science Online Courses[Lists] on:-

Claim your 10 Days FREE Trial for Pluralsight.

Best Data Science Courses on Datacamp

Best Data Science Courses on Coursera

Best Data Science Courses on Udemy

Best Data Science Courses on Pluralsight

Best Data Science Courses & Microdegrees on Udacity

Best Artificial Intelligence[AI] Courses on Coursera

Best Machine Learning[ML] Courses on Coursera

Best Python Programming Courses on Coursera

Best Artificial Intelligence[AI] Courses on Udemy

Best Python Programming Courses on Udemy

Important Lists:	Important Lists, Exams & Cutoffs	Exams after Graduation	PSUs
Goeduhub:	About Us \| Contact Us \|\| Terms & Conditions \| Privacy Policy \|\| Youtube Channel \|\| Telegram Channel	© goeduhub.com	Social:: \| \|

Top K Frequent Words Problem

Please log in or register to answer this question.

1 Answer

Your comment on this answer:

Related questions