Matching large number of chunk names periodically

We have a system which verify two locations have same data periodically (say 30 mins). We can assume that data is made of chunks and each chunk has uniq name. The way it currently match is that it query both locations and match them. Given there are lot of these chunks, system spends lot of time fetching chunk names from database and sending them over to matcher.

Is there something out there, I can use to optimize this and we do not need to send full list of chunk names each time.

If chunks were static, we can just compute crc32 and send that if its does not match, then we can query the chunks. But in our system chunks can be deleted or added anytime. So we need something like running checksum, which we can add / substract a chunk name. I thought about bloom filter but it will not work for us because it can generate false positives. We need to be sure.

asked Jun 13, 2023 at 20:04

Adams

91 bronze badge

A CRC-32 can also give a false positive. How sure is your "sure"?

Mark Adler
– Mark Adler

2023-06-13 23:23:08 +00:00
Commented Jun 13, 2023 at 23:23
@MarkAdler once a while its okay, since we periodically scan and match. Say out of 100 scan, one scan can be full match. It will delay fixing the inconsistency between two data source in case of collision but let system scale. Lower the collision probability better it is. Do you have something in mind?

Adams
– Adams

2023-06-14 01:24:42 +00:00
Commented Jun 14, 2023 at 1:24
I think, you can use some kind of Merkle Tree: en.wikipedia.org/wiki/Merkle_tree

olegarch
– olegarch

2023-06-14 03:55:17 +00:00
Commented Jun 14, 2023 at 3:55

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Matching large number of chunk names periodically

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest