May 12, 2016 #ruby #regex

If the task could be achieved by sprinkling some ruby magic, then better do it rather than trying to find the perfect reg expression.

One day I was trying to convert a liquid img tag to markdown style, one tricky part is the image caption can be omitted.

img_1 = "{% img http//example.com/image.jpg %}"
img_2 = "{% img http//example.com/photo.jpg Image Caption %}"

regex = /{% img (\S+) (.*)\s?%}/

img_1.match(regex)
#<MatchData "{% img http//example.com/image.jpg %}" 1:"http//example.com/image.jpg" 2:"">

img_2.match(regex)
#<MatchData "{% img http//example.com/photo.jpg Image Caption %}" 1:"http//example.com/photo.jpg" 2:"Image Caption ">

The caption matched for img_2 contained a trailing space: Image Caption . I was trying to tweak the regex to eliminate the trailing space but my limited knowledge couldn't get me there.

This is a good place to use gsub with a block, so I could further borrow the String#strip power to get rid of the trailing space.

def convert_liquid_img_tag_to_markdown(text)
  text.gsub(/{% img (\S+) (.*)\s*%}/) do |liquid_img_tag|
    url = Regexp.last_match[1] # same as \1 for inline style
    caption = Regexp.last_match[2].strip

    "![#{caption.to_s.strip}](#{url})"
  end
end

convert_liquid_img_tag_to_markdown(img_1)
# => ![](http//example.com/image.jpg)
convert_liquid_img_tag_to_markdown(img_2)
# => ![Image Caption](http//example.com/photo.jpg)

The link above from thoughtbot mentioned the meaning of having an intention-revealing name, liquid_img_tag in this case, but what's more powerful is the inside the block it's all ruby and matched strings, you can do whatever you want with it.

I'm still curious about the right regex to get rid of the trailing space, but for a simple, one-off task it saves a lot of time to ulitize the ruby power inside the gsub block.

Another example is logging the before/after:

def convert_liquid_github_tag(text)
  return unless text

  text.gsub(/{% gist (.+) %}/) do |match|
    html = "<script src=\"https://gist.github.com/kinopyo/#{Regexp.last_match[1]}.js\"></script>"

    puts "#{match} to #{html}"
    html
  end
end