Tuesday at 8:56 am
- Link
an algorithm, that the authors refer to as DustBuster, which looks individual sites, and tries to see if there are rules being followed on the site where similar content is being shown under different URLs. For example, in the site where “story?id=” can be replaced by “story_”, we are likely to see in the URL list many different pairs of URLs that differ only in this substring; we say that such a pair of URLs is an instance of “story?id=” and “story_”. The set of all instances of a rule is called the rule’s support. Our first attempt to uncover DUST is therefore to seek rules that ha - Simon McManus

