“I bought this guide a few days ago to prepare for my interview with Oracle. Many of the questions they asked me were from this guide. I found this book absolutely great!”
You have 2 arrays each containing a set of numbers. Write an algorithm that gets the elements present in the first array and missing from the second. It should have n complexity.
I’m just writing the pseudo code.here are the steps.
1>Use a hash table
2>scan 1st array and insert the elements in the hashtable(with key as the element and value as 1)
2>scan the 2dn array and for each element in 2nd array which is also present in the hash table, set the corresponding value to 0.
3>scan the hashtable for elements with values 1.
4>COmplexity is of the order O(m+n) where m and n are the number of elements in the the 2 arrays
You mean an array of hashes, not a hashtable (which is a tree). You’re using the hash code as index in the array, so it must be a pretty big array.
Not to mention hash collision.
A better solution is Bloom filters, but you still don’t get 100% accuracy because of hash collision.
MySolution: if the numbers are within a finite range:
- use radix sort to sort both arrays in O(n+m) time.
- scan both arrays in the same time, to find numbers which are just in the first array: O(n+m) time
Global: O(m+n) time.
Why use radix sort? If the range is finite, simply this to make a simple hash into an array. For example, if the range is {a,a+1,…,b} let hash(x)=x-a. There are no collisions and you don’t need to run though the arrays multiple times. With this hash function you can use the rest of Don’s solution.
True, radix sort is also O(n+m), but it is a little overcomplicated, requires running through the lists multiple times, and requires more memory.
You can use a hash. Add the elements in the second array to the hash. ( O(size of th second array) )
Traverse the first array: look up for its elements in the hash. If an element in the first array is found in the hash, it means that it exists in the second array => do nothing; otherwise, if an element in the first array is not found in the hash, it means that it does not belong to the second array => add it to the set of elements present in the first array and missing from the second one. ( constant look-up time => O(size of first array) )
Thus, this can be done in O(size of first array + size of second array).
Here is another approach similar to above, but without using a vector but using a 2D array.
This algorithm does not need to sort the arrays.
Here are the steps of the algorithm.
1. Find out the maximum entry in array1. Let this be lmax1.
2. Find out maximum entry in array2. Let this be lmax2.
3. Create a dynamic 2D array of size (lmax1+1)x(lmax2+1). i.e int array3[lmax1+1][lmax2+1].
4. Loop through array1 and set every entry in arrray3
such as array3[index][0] to 1 where index is a number in array1.
5. Loop through array2 and set every entry in array3 such as array3[0][index] to 1 where index is a number in array2.
6. Now any entry “num” which is repeated in both array1 and array2 will be such a way that array3[num][0]=1 as well as array3[0][num]=1. Any entry which is not repeated will not have this property.
7. Now loop through array1 and skip those entries which have the above property. Also skip any entry whose value is larger than the maximum entry in array2, lmax2. Add the rest of the entries to a new array.
8. Return the new array.
Now the whole code executes in O(n+m) where n is size of array1 and m size of array2.
Here is the Python code for this.
def missing(l1, l2):
# Find the largest value in l1
lmax1, lmax2 = 0, 0
for item in l1:
if item>lmax1: lmax1 = item
for item in l2:
if item>lmax2: lmax2 = item
# Create a 2-d array of size lmax1xlmax2
l3 = [[0]*(lmax2+1)]*(lmax1+1)
# Loop through l1 and put entry at row
# index idx to 1
for item in l1:
l3[item][0] = 1
# Loop through l2 and put entry at column
# index idx to 1
for item in l2:
l3[0][item] = 1
# Now loop through l1 and return a list
# for those indices who do not have
# both [idx][0] and [0][idx] set to 1
l4 = []
for item in l1:
# If this item is greater than the maximum
# in l2, it is not repeated…
if item>lmax2: l4.append(item)
if l3[item][0]==1 and l3[0][item]==1:
continue
l4.append(item)
It is similar to removing duplicate elements from both the arrays.Construct the a binary tree for all the numbers in array.If there is a duplicate, remove the element from A.
The complexity will be O(n) only.
I’m just writing the pseudo code.here are the steps.
1>Use a hash table
2>scan 1st array and insert the elements in the hashtable(with key as the element and value as 1)
2>scan the 2dn array and for each element in 2nd array which is also present in the hash table, set the corresponding value to 0.
3>scan the hashtable for elements with values 1.
4>COmplexity is of the order O(m+n) where m and n are the number of elements in the the 2 arrays
~Don
Don.
You mean an array of hashes, not a hashtable (which is a tree). You’re using the hash code as index in the array, so it must be a pretty big array.
Not to mention hash collision.
A better solution is Bloom filters, but you still don’t get 100% accuracy because of hash collision.
MySolution: if the numbers are within a finite range:
- use radix sort to sort both arrays in O(n+m) time.
- scan both arrays in the same time, to find numbers which are just in the first array: O(n+m) time
Global: O(m+n) time.
I too had the idea of hash map. How about this:
a [1,2,3…………n]
b [2,4,12,……….n]
for(i=0;i
I catch the idea of c32’s solution.
[pseudo code]:
array_a[n] ;
array_b[m] ;
array_result[n] ;
sort(array_a) && sort(array_b) ; //we can do this in O(N + M) time
for (int i = 0, j = 0, r = 0; i array_b[j])
j++ then continue to loop;
if (array_a[i] == array_b[j])
i++, j++ then continue to loop;
if (array_a[i]
Much easier:
let’s say A,B are vectors, like in Matlab.
B=A-B;
A[find(B==0)] are those who are equal.
(A+B)[find(B!=0)] are the rest.
Why use radix sort? If the range is finite, simply this to make a simple hash into an array. For example, if the range is {a,a+1,…,b} let hash(x)=x-a. There are no collisions and you don’t need to run though the arrays multiple times. With this hash function you can use the rest of Don’s solution.
True, radix sort is also O(n+m), but it is a little overcomplicated, requires running through the lists multiple times, and requires more memory.
You can use a hash. Add the elements in the second array to the hash. ( O(size of th second array) )
Traverse the first array: look up for its elements in the hash. If an element in the first array is found in the hash, it means that it exists in the second array => do nothing; otherwise, if an element in the first array is not found in the hash, it means that it does not belong to the second array => add it to the set of elements present in the first array and missing from the second one. ( constant look-up time => O(size of first array) )
Thus, this can be done in O(size of first array + size of second array).
If we are allowed to assume all numbers are integers and within a relatively small range (say lower than 10^6):
I’ll name A and B the two given vectors.
0) create vector of booleans V, initialized to false
1) for all the elements a in A,
do V[a] = true
2) for all the elements b in B,
do V[b] = false
3) the solution is, then:
all the indexes i, such that V[i] == true
if negative integers are also allowed, the algorithm can be easily adapted. would it be easily adapted too for real numbers with a limited precision
memory use is the obvious drawback of the algorithm, but it’s linear in time
Note: the size of V is max(max(A), max(B))
Here is another approach similar to above, but without using a vector but using a 2D array.
This algorithm does not need to sort the arrays.
Here are the steps of the algorithm.
1. Find out the maximum entry in array1. Let this be lmax1.
2. Find out maximum entry in array2. Let this be lmax2.
3. Create a dynamic 2D array of size (lmax1+1)x(lmax2+1). i.e int array3[lmax1+1][lmax2+1].
4. Loop through array1 and set every entry in arrray3
such as array3[index][0] to 1 where index is a number in array1.
5. Loop through array2 and set every entry in array3 such as array3[0][index] to 1 where index is a number in array2.
6. Now any entry “num” which is repeated in both array1 and array2 will be such a way that array3[num][0]=1 as well as array3[0][num]=1. Any entry which is not repeated will not have this property.
7. Now loop through array1 and skip those entries which have the above property. Also skip any entry whose value is larger than the maximum entry in array2, lmax2. Add the rest of the entries to a new array.
8. Return the new array.
Now the whole code executes in O(n+m) where n is size of array1 and m size of array2.
Here is the Python code for this.
def missing(l1, l2):
# Find the largest value in l1
lmax1, lmax2 = 0, 0
for item in l1:
if item>lmax1: lmax1 = item
for item in l2:
if item>lmax2: lmax2 = item
# Create a 2-d array of size lmax1xlmax2
l3 = [[0]*(lmax2+1)]*(lmax1+1)
# Loop through l1 and put entry at row
# index idx to 1
for item in l1:
l3[item][0] = 1
# Loop through l2 and put entry at column
# index idx to 1
for item in l2:
l3[0][item] = 1
# Now loop through l1 and return a list
# for those indices who do not have
# both [idx][0] and [0][idx] set to 1
l4 = []
for item in l1:
# If this item is greater than the maximum
# in l2, it is not repeated…
if item>lmax2: l4.append(item)
if l3[item][0]==1 and l3[0][item]==1:
continue
l4.append(item)
return l4
Of course, in the last loop,
if item>lmax2: l4.append(item)
needs to be replaced with,
if item>lmax2:
l4.append(item)
continue
It is similar to removing duplicate elements from both the arrays.Construct the a binary tree for all the numbers in array.If there is a duplicate, remove the element from A.
The complexity will be O(n) only.
Leave an Answer/Comment