Kurt Jarchow's Blog

September 4, 2009

All About the Middle

Filed under: Uncategorized — Kurt Jarchow @ 2:56 pm

I don’t have a whole lot of interesting in journalism, but I am going to pay close attention to the current news industry crisis.   Why?  Hint: it’s about much more than the news.

The information revolution has had a dramatically different affect on various industries, but there is a similar theme to them all; the breaking of hierarchal (top down) control by empowering users was what really broke the system.  For the interest of the rest of my post, try picturing the companies as being big dots up above millions of little dots underneath.   If you “connected the dots” (draw the lines of distribution) you would have seen many vertical lines connected to the few big dots at the top.  Now consider today’s situation.   The little dots are connecting with each other- this is the age of the horizontal line.

So why should people pay attention to the new news?  It’s not just a discussion of saving the news, it’s the first time a crisis of industry disruption is being openly debated and (hopefully) rebooting.  I watched or listened to as much of the Aspen Institutes FOCUS 2009 conference as I could because it was a great meeting of minds, and I believe a lot of those concepts will leek into other industries when similar disruption situations occur.

I’ll share some interesting points I noted listening to the conference:

  1. Jeff Jarvis talks of “layoffs with strings”.  When the Top needs to lay off an employee, the employee may be interested in starting their own site.  It would be beneficial for the an organization like the NYT to not only keep ties with a former employee, but to invest in them.
  2. Focus on users.  I can’t stress this enough because if you don’t do it, someone else will.  Your business will suffer if you don’t make the sacrifices your users want to see.
  3. Marissa Mayer had a great idea to include news into your “life stream”.  It would intelligently find news and filter your interests in the form of a Facebook-esque feed.  It’s just another great example for concentrating on user wants, not on your own tradition.
  4. Hyper localization and niche is the key.  People love niches, and they love relevant content.
  5. Many users want to participate, and if they’re shy they like to see other’s participate.  Getting people involved is the surest way to get users returning.
  6. Trust.  Probably the most important point they hit on.  If they don’t trust your site, given an alternative, they will not use your site.  Building will mean being transparency, and transparency is the new objectivity.

Some of these are specific to the print industry, but I’d argue most are idealogical changes that any industry can use to stay relevant (unfortunately most won’t).  It’s about stripping the notion of a top-down control, and about actively being involved in the Middle (with all the little dots).  Stop “fighting the internet” as Dave Winer would say, because in the end your going to get beat.

I owe a lot of this thinking to Jay Rosen, a NYU journalism professor who has tirelessly “reported” on the new news (rebooting the news).

April 20, 2009

SOLR and Multi-Languages

Filed under: Uncategorized — Kurt Jarchow @ 9:09 pm

One of our site requires Japanese search capabilities which looked easy on paper, but after playing around for the better part of a day I realized it wasn’t. It doesn’t take much to get it going, but sifting through mailing lists is pretty much the only way to get the details.

If you do a search using Japanese characters you’ll notice that solr will find results out of the box. The problem is that it uses English methods of search. The main problems this creates is that is assumes spaces separate words, which is not true in Japanese. A “tokenizer” creates the words for lucene to use. So we need to use a tokenizer that will work with Japanese. The CJK Tokenizer does the trick.

SOLR will not handle multiple languages in a single field so we need to create a new field for each language. First though you need to create the type of field:

Now define the field:

I, for whatever reason, could not get this going without specifically identifying which field the search should use, so I added “qf=body_ja” to my query string. If you see other syntax for defining the field type don’t use it (at least with solr 1.3-1.4). It seems to break the words up correctly but you won’t be able to search.

This tokenizer will is not 100% however. It breaks the characters into pairs and solr does its best to find matches. I’m not sure for how much, but you can find a better tokenizer at basistech.com.

After you adjust your front end to save into the new fields you should be off to the race.

April 6, 2009

SOLR 1.4 / LocalSOLR Gotchas

Filed under: Uncategorized — Kurt Jarchow @ 6:56 pm

I wanted to write down some strange behavior I came across… hopefully it’ll help someone.

Firstly, SOLR 1.4 changed the date range query syntax.  Instead of “createddate[* to NOW]” try “createddate[* NOW]“.

Secondly running empty query with SOLR 1.4 and solrlocal has a funny syntax.  If  I’m doing a blank search without localsolr I’d just use a space (or %20) to select all.  But, when I include localsolr syntax, I have to use the old q=*:* syntax.

Hope this helps someone to not waste as much time as I did trying to figure it out.  If your developing in Drupal and want location-based search using solr, please check out http://drupal.org/node/347428

March 23, 2009

Is Google PageRank part of the stimulus package?

Filed under: Uncategorized — Kurt Jarchow @ 8:31 pm

I found this one really interesting about Media Giants wanting inflated PageRank rating because they are… brand names? Trying to justify this is a backward step for net neutrality is a horrible attempt to keep the old media alive.

This actually plays nicely into a conversation I had with a friend of mine about crowd-sourcing and twitter.  He argued that the most relevant thoughts and ideas would get lost if a billion people were demanding the same attention.  I argued though that crowd-sourcing is exactly what solves this problem; the best thoughts and ideas get brought to the surface.  Some times the “best” sources don’t have a brand name and we need mechanisms like crowd-sourcing (PageRank) to sift them out.

I like this: “You should not have a system,” one content executive said, “where those who are essentially parasites off the true producers of content benefit disproportionately.” Can any newspaper say that they never find a story from another source?  I am wrong or is he calling bloggers parasites?

I do sympathsize with the jouralists out there but don’t try and cheat your way to the top.  Your only prolonging the inevitable.

I can’t see Google changing their algorithm to accommodate sites like ESPN, what do you think?

March 20, 2009

Same blog, new direction

Filed under: Uncategorized — Kurt Jarchow @ 8:16 pm

I haven’t been able to post much lately.  I have been settling back into Canada while still working by contract at home.  I’ve decided to go a step further and contract through my newly created company, K-Jar Consulting.

I have had a few minutes here and there to really think about my career and life goals.  Its not easy!  What you really have to do is ignore all of those “fad” ideas that come and go and really find your true calling.   Even predating this blog I have been consistently passionate about government transparency, and applying web 2.0 principles to government (Government 2.o is you want a buzz word). 

Thankfully I have a like-minded friend who is equally passionite about data and its applications, so we have decided to team up and bring more awareness to the public.  The focus of this blog will reflect more about current issues and thoughts I have on government 2.0 until we can build a home for our projects.

More details will be posted in the coming days and weeks, but if you’re as equally minded about government and the future of democracy please contact me at [email protected]  

First order of business, find a website name!  Any ideas?

March 10, 2009

Hosting for Drupal & SOLR

Filed under: Uncategorized — Kurt Jarchow @ 2:54 am

Although setting up both is almost cut & paste I thought this hosting solution from Acquia was interesting (mainly because it references SOLR specifically).

This might be a good solution for you if you want to get something up and running quickly.

Update on LocalSOLR & Drupal

Filed under: Uncategorized — Kurt Jarchow @ 2:51 am

I’ve been distracted with other things the past few months but I’ve still had this on the back-burner.  I’ve had a lot of difficulties integrating LocalSOLR into the apachesolr drupal module.  The projects upgrade to 1.4 made me start from scratch again with the SOLR configuration, but with some patients and some help from pjaol (LocalSOLR author) I have a working copy.  I’m still struggling right now however making it run with other search handlers, like spelling and highlighting, but these aren’t really that important to me anyway.  I’ll have the code up soon I promise!  

If you need it right away, email me and I can walk you through it.

March 6, 2009

Your Health Care Options – ads

Filed under: Uncategorized — Kurt Jarchow @ 2:12 pm

I was on MSNBC.com today and I noticed a nice ad from our Ministry of Health.  I was intrigued; the ad sold me on the promise of health care centers on Google maps.  This is something I don’t expect from our usually low-tech government.

ontariohealthbanner

I click the flash ad and I am presented with this page:

ontariohealthbody

Maybe it is just me but if I see a nice fancy ad and then brought to a text heavy, boring listing on paragraphs I usually would just click away, but for the sake of writing I dove deeper into the content and found the Google map (the hidden “gem” was the first link “Medical Services Directory”, but you would never know unless you clicked it).

I bet there would be 50% fewer bounces if they added a simple image highlighting the content, or at the very least give it a sexier name.  Our tax money at work!  Wasted ad spending.

That being said the search isn’t too bad, but I would have liked to see the map bigger (the legend is twice as big as the map).

February 24, 2009

Time for a move

Filed under: Uncategorized — Kurt Jarchow @ 8:37 am

On a personal note, I’d like to announce that I’ll be moving back to Toronto at the end of the week.  I’ve been living and working in Cork Ireland for the past year to get a little european culture, but my wife and I feel that it is the right time cut our adventure a little short.  We’ve met a lot a great people here, and we’ll miss them all dearly.

I’ll be working from home when I come back for a while, so hopefully I’ll have more time to update my blog with more random thoughts.

Thanks to everyone in Cork who made our brief stay fantastic!

February 23, 2009

SOLR AND/OR Boolean Operators & DisMax

Filed under: Uncategorized — Kurt Jarchow @ 9:08 am

I had a problem with our search service.  SOLR was doing a great job with retrieving search results, but with over 300,000 it was hard to really narrow down specific results.  With most search engines they’ve solved this by using operators like AND/OR/NOT, and while SOLR support these, it seems to disappear when you enable dismax.  

Unfortunately, this made the Drupal apachesolr installation incompatible with the AND/OR operators.  But, almost always with SOLR, I found a solution.  Setting the “minimum should match” (mm) parameter to 1 enabled AND/OR operators seemed to do the trick.  (NOTE: Please test your results before setting this live- there might be some unwanted side-effects)

When testing AND/OR I found it confusing until I read a great article explaining how the AND/OR system works.  By using AND/OR you are actually just identify text as being REQUIRED or OPTIONAL.  I was first confusing by the results with this search query:

java AND  (cork OR dublin)

What I’m asking for here is to find all results that must contain java in cork or dublin.  This won’t work.  Use:

java AND ( OR cork OR dublin)

This will properly identify cork and dublin as being optional.

« Newer PostsOlder Posts »

Powered by WordPress

Bad Behavior has blocked 104 access attempts in the last 7 days.