How to output duplicates in Yahoo! Pipes
So this morning, I found a refutation (http://lifehacker.com/5259540/feedflix-notifies-you-when-netflix-movies-are-available-to-stream?t=12906194#c12906194) that I can use Yahoo! Pipes (http://pipes.yahoo.com/pipes/pipe.info?_id=wppyFmlE3hGYpHmADoSbGg) to be notified when DVDs in my Netflix (http://www.netflix.com/) queue become available to View Instantly. To be honest, I hadn't actually checked when I made the claim. I just worked it mentally.
Turns out I was right (so MCWHAMMER can suck it!) but it was not nearly as simple as I'd thought, and as far as I can find, no one else has published a method for outputting dupes in Yahoo! Pipes (http://pipes.yahoo.com/pipes/pipe.info?_id=wppyFmlE3hGYpHmADoSbGg). So I guess I'll be the first.
First, why would anyone want to output duplicate items from RSS feeds? Well, there will always be some reason to find where two groups intersect. Maybe you want to see which of your blog entries someone is copying. In my case, I want to see which of my Netflix DVDs are newly available to View Instantly. Here's how I did it:
(1) and (2): Log in to Netflix and get URLs for the "New choices to watch instantly" RSS feed and your personalized queue RSS feed. Order makes a difference because I'm matching (1) against (2). That's why they're separate then combined in order (3).
The merged feed is then split (4) in two identical feeds. One is sent through a loop (5) to add an identifying tag to mark this as the full combined feed. The other is sent to a filter (6) to strip out duplicate items. Because Netflix uses one description for a given movie in any format, I use that to determine uniqueness. This results in the two identical feeds being slightly different.
Then the two feeds are spliced back together(7) and sorted by the field with the identifying tag (8), in this case title. What's important here is the tagged items appear after the untagged items, except for those that don't have a match after the previous filter.
Finally, the items are filtered for uniqueness again (9). The unique filter discards every identical item after the first, that means the tagged items, except again for those that don't have a match after the previous filter. Thus, when I filter out the items that don't have the identifying tag (10), all that are left are the items in both feeds, in my queue and new to watch instantly.
Turns out only one of my picks was added to view instantly (but it's Die Hard, which is awesome). To check its accuracy, I added the newest instant view item to my queue, and it showed up right after the feeds updated. Worked with the second test too.
Obviously, working with data from a single source with known duplication makes this a lot easier, but it should work in less controlled situations with some tweaking. And it should work fine to check if someone is stealing your feed verbatim.
[+/-] Hide/Show Text
[+/-] Hide/Show Text
Labels: general tech
How to output duplicates in Yahoo! Pipes
posted by Sumocat at 5/19/2009 09:48:00 PM
1 Comments:
Thanks for the idea, I've been trying to do this for a while and this was good inspiration.
I ended up going slightly simpler but this was a great starting point.
I made sure there was a common item in both feeds, joined them, filtered on unique and then made use of the item.y:repeatcount that gets generated by the unique filter to filter for item.y:repeatcount being greater than 1. These are the items that appeared in both feeds.
Thanks again for the pointers!
By JoeyJoe, at 9/24/2009 05:25:00 PM
Post a Comment
<< Home