Kurt Jarchow's Blog

June 3, 2009

My Take: Google Wave

Filed under: Tech Thoughts — Kurt Jarchow @ 6:56 pm

I watched the entire 1.20 minutes of the Google Wave preview and I have to admit it gave me goose bumps (and I can’t remember the last time that’s happened).   I spent the next 2 days trying to understand why, and then the next 2 days after that searching for someones ideas.  I noticed something though- there isn’t a lot of talk about how powerful this technology potentially will be.

If you haven’t seen the webcast (I highly suggest you do) let me sum up: Google wave is a new content model that allows fluid as-you-type conversation with the ability to historically view any changes in a “wave”.    It also lets you treat a wave as an object, so you can share the wave anywhere on the web.  It is also completely opensource so you can count on many developer modules.

Maybe I am falling into a “hype trap” here, but its my job to anticipate the web evolution, and I need to keep on my toes.  Other posts I’ve read talk about the Wave replacing email, but I think it goes much deeper than that.  What would a seamless, real-time, portable, extensible, content distribution model look like on the web?

I am forced to consider news and how its distributed, but I’m sure there are less obvious ways of using this service.  Have you ever gone to cnn.com and got the feeling you’re missing something?  What is happening right now?  Maybe its a newly acquired ADHD symptom (thanks Twitter!), but everything feels tired.  Ok say you have patients and that doesn’t matter to you (I think it does), have you ever read a story that may have happened an hour ago and felt like you’ve missed out?

We are so used to a newspaper giving us our printed content we expect an article on the web to reflect this, but we are missing out.  Why can’t we see a reporter type out a story as it happens?  Or why can’t additional images and video be added while we read?  As the story unfolds content is updated, and just like any other kind of content its completely mashable with other web services.  Ticker-stock added immediately, profiles of people, statistics for towns, Amazon book titles for authors, all can be automatically (with some assistance) to everything you type.

Its combining the best part of live TV jounalism with the best elements of the web.  We are no longer recording, we are telling a story.  And because a wave is so portible, these stories could be happening anywhere on the web (content sites take note: curation curation curation).

This might all seem a little chaotic, but I’m know its manageable.   Maybe its as easy and enabling/disabling live waves.

Now I will spend the next 2 days trying to figure out how this can be used with Government 2.0.  Anyone have any ideas on how we can use Google Wave?

April 20, 2009

SOLR and Multi-Languages

Filed under: Uncategorized — Kurt Jarchow @ 9:09 pm

One of our site requires Japanese search capabilities which looked easy on paper, but after playing around for the better part of a day I realized it wasn’t. It doesn’t take much to get it going, but sifting through mailing lists is pretty much the only way to get the details.

If you do a search using Japanese characters you’ll notice that solr will find results out of the box. The problem is that it uses English methods of search. The main problems this creates is that is assumes spaces separate words, which is not true in Japanese. A “tokenizer” creates the words for lucene to use. So we need to use a tokenizer that will work with Japanese. The CJK Tokenizer does the trick.

SOLR will not handle multiple languages in a single field so we need to create a new field for each language. First though you need to create the type of field:

Now define the field:

I, for whatever reason, could not get this going without specifically identifying which field the search should use, so I added “qf=body_ja” to my query string. If you see other syntax for defining the field type don’t use it (at least with solr 1.3-1.4). It seems to break the words up correctly but you won’t be able to search.

This tokenizer will is not 100% however. It breaks the characters into pairs and solr does its best to find matches. I’m not sure for how much, but you can find a better tokenizer at basistech.com.

After you adjust your front end to save into the new fields you should be off to the race.

April 6, 2009

SOLR 1.4 / LocalSOLR Gotchas

Filed under: Uncategorized — Kurt Jarchow @ 6:56 pm

I wanted to write down some strange behavior I came across… hopefully it’ll help someone.

Firstly, SOLR 1.4 changed the date range query syntax.  Instead of “createddate[* to NOW]” try “createddate[* NOW]“.

Secondly running empty query with SOLR 1.4 and solrlocal has a funny syntax.  If  I’m doing a blank search without localsolr I’d just use a space (or %20) to select all.  But, when I include localsolr syntax, I have to use the old q=*:* syntax.

Hope this helps someone to not waste as much time as I did trying to figure it out.  If your developing in Drupal and want location-based search using solr, please check out http://drupal.org/node/347428

March 23, 2009

Is Google PageRank part of the stimulus package?

Filed under: Uncategorized — Kurt Jarchow @ 8:31 pm

I found this one really interesting about Media Giants wanting inflated PageRank rating because they are… brand names? Trying to justify this is a backward step for net neutrality is a horrible attempt to keep the old media alive.

This actually plays nicely into a conversation I had with a friend of mine about crowd-sourcing and twitter.  He argued that the most relevant thoughts and ideas would get lost if a billion people were demanding the same attention.  I argued though that crowd-sourcing is exactly what solves this problem; the best thoughts and ideas get brought to the surface.  Some times the “best” sources don’t have a brand name and we need mechanisms like crowd-sourcing (PageRank) to sift them out.

I like this: “You should not have a system,” one content executive said, “where those who are essentially parasites off the true producers of content benefit disproportionately.” Can any newspaper say that they never find a story from another source?  I am wrong or is he calling bloggers parasites?

I do sympathsize with the jouralists out there but don’t try and cheat your way to the top.  Your only prolonging the inevitable.

I can’t see Google changing their algorithm to accommodate sites like ESPN, what do you think?

March 20, 2009

Same blog, new direction

Filed under: Uncategorized — Kurt Jarchow @ 8:16 pm

I haven’t been able to post much lately.  I have been settling back into Canada while still working by contract at home.  I’ve decided to go a step further and contract through my newly created company, K-Jar Consulting.

I have had a few minutes here and there to really think about my career and life goals.  Its not easy!  What you really have to do is ignore all of those “fad” ideas that come and go and really find your true calling.   Even predating this blog I have been consistently passionate about government transparency, and applying web 2.0 principles to government (Government 2.o is you want a buzz word). 

Thankfully I have a like-minded friend who is equally passionite about data and its applications, so we have decided to team up and bring more awareness to the public.  The focus of this blog will reflect more about current issues and thoughts I have on government 2.0 until we can build a home for our projects.

More details will be posted in the coming days and weeks, but if you’re as equally minded about government and the future of democracy please contact me at jarchow.kurt@gmail.com.  

First order of business, find a website name!  Any ideas?

March 10, 2009

Hosting for Drupal & SOLR

Filed under: Uncategorized — Kurt Jarchow @ 2:54 am

Although setting up both is almost cut & paste I thought this hosting solution from Acquia was interesting (mainly because it references SOLR specifically).

This might be a good solution for you if you want to get something up and running quickly.

Update on LocalSOLR & Drupal

Filed under: Uncategorized — Kurt Jarchow @ 2:51 am

I’ve been distracted with other things the past few months but I’ve still had this on the back-burner.  I’ve had a lot of difficulties integrating LocalSOLR into the apachesolr drupal module.  The projects upgrade to 1.4 made me start from scratch again with the SOLR configuration, but with some patients and some help from pjaol (LocalSOLR author) I have a working copy.  I’m still struggling right now however making it run with other search handlers, like spelling and highlighting, but these aren’t really that important to me anyway.  I’ll have the code up soon I promise!  

If you need it right away, email me and I can walk you through it.

March 6, 2009

Your Health Care Options – ads

Filed under: Uncategorized — Kurt Jarchow @ 2:12 pm

I was on MSNBC.com today and I noticed a nice ad from our Ministry of Health.  I was intrigued; the ad sold me on the promise of health care centers on Google maps.  This is something I don’t expect from our usually low-tech government.

ontariohealthbanner

I click the flash ad and I am presented with this page:

ontariohealthbody

Maybe it is just me but if I see a nice fancy ad and then brought to a text heavy, boring listing on paragraphs I usually would just click away, but for the sake of writing I dove deeper into the content and found the Google map (the hidden “gem” was the first link “Medical Services Directory”, but you would never know unless you clicked it).

I bet there would be 50% fewer bounces if they added a simple image highlighting the content, or at the very least give it a sexier name.  Our tax money at work!  Wasted ad spending.

That being said the search isn’t too bad, but I would have liked to see the map bigger (the legend is twice as big as the map).

February 24, 2009

Time for a move

Filed under: Uncategorized — Kurt Jarchow @ 8:37 am

On a personal note, I’d like to announce that I’ll be moving back to Toronto at the end of the week.  I’ve been living and working in Cork Ireland for the past year to get a little european culture, but my wife and I feel that it is the right time cut our adventure a little short.  We’ve met a lot a great people here, and we’ll miss them all dearly.

I’ll be working from home when I come back for a while, so hopefully I’ll have more time to update my blog with more random thoughts.

Thanks to everyone in Cork who made our brief stay fantastic!

February 23, 2009

SOLR AND/OR Boolean Operators & DisMax

Filed under: Uncategorized — Kurt Jarchow @ 9:08 am

I had a problem with our search service.  SOLR was doing a great job with retrieving search results, but with over 300,000 it was hard to really narrow down specific results.  With most search engines they’ve solved this by using operators like AND/OR/NOT, and while SOLR support these, it seems to disappear when you enable dismax.  

Unfortunately, this made the Drupal apachesolr installation incompatible with the AND/OR operators.  But, almost always with SOLR, I found a solution.  Setting the “minimum should match” (mm) parameter to 1 enabled AND/OR operators seemed to do the trick.  (NOTE: Please test your results before setting this live- there might be some unwanted side-effects)

When testing AND/OR I found it confusing until I read a great article explaining how the AND/OR system works.  By using AND/OR you are actually just identify text as being REQUIRED or OPTIONAL.  I was first confusing by the results with this search query:

java AND  (cork OR dublin)

What I’m asking for here is to find all results that must contain java in cork or dublin.  This won’t work.  Use:

java AND ( OR cork OR dublin)

This will properly identify cork and dublin as being optional.

« Newer PostsOlder Posts »

Powered by WordPress

Bad Behavior has blocked 86 access attempts in the last 7 days.