July 2012 Archives

As an ASX "participant" I have used Wotnews, Yahoo! Alerts and Google Alerts to monitor for news pertaining to my investments, particularly based on key officer names, product names, and obviously the company names.

Wotnews works well because they've restricted their corpus to Australian sources. I get a single email each morning with a list of relevant headlines and a short snippet from each article. Problem is that Wotnews is closing down soon.

Google & Yahoo! alerts are pretty noisy because of the fact that they scour the entire web. Besides that, it's apparent that no one at Google actually uses their news alerts service because it's quite obviously broken -- articles that are in their news index don't show up at all in the relevant arlert and it has remained that way for weeks at a time.

So I read with interest Jason Calacanis' most recent missive on his latest experiment, the LAUNCH Ticker.

I share his first 3 desires:

1. Just the facts in as few words as possible.
2. Only cover things you think I would find interesting/important
3. Give me screenshots, graphs and appropriate links as cleanly as possible.

The tricky thing is #2 and consequently how you implement #1 & #3, though I consider the latter two nice to have as no one has solved relevancy (#2) well.

Jason's LAUNCH Ticker works great for him I'm sure but to me it's full of irrelevant crap. That's not his fault obviously, we simply have different interests.

Relevant is hard -- there's the relevant news you know you want and can, for example, specify the right keywords for: people, topics, brands, etc, essentially entities expressable as text. And there's related news that you don't yet know you'd be interested in -- for example a new competitor, a new product, a new industry.

I think the latter though is another nice to have. No one seems to have solved the basic problem that given a list of entities, please give me a concise summary of relevant news.

To give a concrete example, Wotnews turns this:

Into this:

Granted Wotnews is not perfect; there's still a bit of noise (what I would call tech industry trash gossip). But it saves me from scouring 5-10 individual papers, and it picks up the odd relevant blog post.

If there's any bright sparks out there who would like some end-user advice on building such a tool please feel free to get in touch. Over the years I've seen many "personalised news" services come and go and very few of them produce anything useful.

I'm not aware of any service that uses a real entity/knowledge graph to filter relevant news, though I'm yet to try out LexisNexis' services (if anyone has any experience here please share, I'd like to know how well their services work, and the rough pricing).

I wrangled data feeds for many years at Yahoo! and it was always interesting how things could break, and the flow on effects. Y! didn't (and still doesn't) produce much original content; most of it is licensed from 3rd party providers.

One interesting scenario occurred after we had recently changed our provider of sports data. The old provider (let's call them ABC) accused the new provider (call them XYZ) of stealing their data. Note that this was back in the day when these providers were literally hiring folks to watch live sport on TV in order to type in the scores that they would then send to us in XML; so anything you could do to minimise that labour cost is a huge win.

ABC suspected XYZ of scraping the scores data off XYZ's website. In order to catch XYZ scraping ABC deliberately published incorrect data on their own website. In one instance they swapped the final scores of a particular NBL game, waited for XYZ (and Y! Sports) to update, and then flicked the scores back to see if the changes were reflected on both sites (they were).

I'm not sure how that eventually panned out from a legal POV but both ABC and XYZ continue to exist today.

Yesterday one of my monitoring scripts alerted me to a teeny tiny quirk on the ASX Company Announcements page:

Notice the "-" under pages. The announcement obviously doesn't have "-" pages, in fact it has exactly one page.

So it made me wonder how many downstream systems parroted that quirk.

Here's CommSec:

And ETrade:

And CMC Markets:

So the answer appears to be an exciting ... zero.

Oddly enough, ASX's own ERN specific announcements page also lists zero rather than "-".

Which suggests that the code on their generic announcements listing is not the same as the code used on the ticker specific pages.

Finally, it led to the pondering of how that pages number is generated. Clearly if it was derived from the PDF it would not be "-" or zero as the PDF has one page. So maybe someone, somewhere, is typing those numbers in?

That may make sense given that up until ~2003 many of the announcements were PDFs of what appeared to be faxed documents; the number of pages could have served as a simple check that all pages were successfully received.

I wish I could find out what happened behind the scenes to create this "-" or zero page datum. My monitoring script has been running on the ASX for 7 years and this is the first time something like this has appeared.



Recent Comments

  • Adam: Hey mate, I`m using http://www.sunvpn.com/ to watch HULU from Perth. read more
  • goosmurf: hey rifleman! :) Cheers everyone for the updates, great to read more
  • Ben B: Wow, blast from the past. Came across this post whilst read more
  • Eris: I love you! read more
  • aki: I don't agree that Facebook is an example where simplicity read more
  • web designer: A QR code (abbreviated from Quick Response code) is a read more
  • Dentall Recall Systems: I think that when Larry Page talks about start up read more
  • Paul Zagoridis: While I agree with you that slicing and dicing a read more
  • Mobile Marketer: Its strange how QR codes have never really 'taken off' read more
  • Logo Design: We are also working on expanding the key areas for read more

About this Archive

This page is an archive of entries from July 2012 listed from newest to oldest.

May 2012 is the previous archive.

August 2012 is the next archive.

Find recent content on the main index or look in the archives to find all content.