# H-Index Problem

0 like 0 dislike
52 views

Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.

According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."

Example:

```Input: `citations = [3,0,6,1,5]`
Output: 3
Explanation: `[3,0,6,1,5] `means the researcher has `5` papers in total and each of them had
received `3, 0, 6, 1, 5` citations respectively.
Since the researcher has `3` papers with at least `3` citations each and the remaining
two with no more than `3` citations each, her h-index is `3`.```

Note: If there are several possible values for h, the maximum one is taken as the h-index.

0 like 0 dislike
by Goeduhub's Expert (2.3k points)

The main trick is to count for each possible number of citations, how many times we see this number in our citations. Note, that if number of citations is more than total number of papers N, we can reduce this numer to N and nothing will change. Let me explain my solutoin given test example [3,0,6,1,5].

We create array, which I called buckets = [1, 1, 0, 1, 0, 2], where buckets[i] is number of papers with i citations if i < N and bucket[N] is number of papers with >=N citations.

Now, we create accum array, where accum[0] is number of papers with >=1 citation, accum[1] is number of papers with >=2 citations and so on. When we evaluate it for our example we can see, that it is equal to accum = [4,3,3,2,2]. Note, that we start with 1 citation, not with zero, that is why we need to use accum[1:] in our code.

Finally, we need to go through this array and find the bigest number i, for which accum[i] >= i + 1 and return i + 1, in our example it is equal to 3.

Complexity

Complexity is O(n), though we traverse our data 4 times, and it is not the optimal solution. We can evaluate cumulative sums in place, and compare them in place with i + 1 and keep the index of maximum reached so far, and interrupt when inequality accum[i] >= i + 1 does not hold anymore. Howerer I like cumulative sums and code is more clean in my way.

class Solution:

def hIndex(self, citations):

N = len(citations)

buckets = [0] * (N + 1)

for elem in citations:

buckets[min(elem, N)] += 1

accum = list(accumulate(buckets[1:][::-1]))[::-1]

compar = [accum[i] >= i + 1 for i in range(N)]

return (compar + [0]).index(0)

eg  : ans = Solution()

ans.hIndex([3,0,6,1,5])

Output -

3