Chinese Character Count in PHP

Posted on

I just spent five hours banging my head against the wall trying to figure this out. In hopes that I can prevent someone else from suffering the same fate, I decided to share this.

Background & Issues:

Chinese characters are word symbols rather than letters. A word count is impossible because there are no spaces between words.

Instead you have to do a character count. Unfortunately using a substring (PHP: substr()) won’t work because Chinese characters are encoded in Unicode.

Keep in mind that the $content string is UTF-8 encoded. Each Chinese character is composed of multiple characters. A ten character long string is equal to only three UTF-8 Chinese characters.

$content = '有史以來最好的網站';
// In Unicode (UTF-8) this string equals '%E6%9C%89%E5%8F%B2%E4%BB%A5%E4%BE%86%E6%9C%80%E5%A5%BD%E7%9A%84%E7%B6%B2%E7%AB%99'

$excerpt = substr($content, 0, 10);
// substr() does not recognize Unicode characters, this results in broken characters at the end of the excerpt.

echo $excerpt;

The above snippet will output:

有史以�

(The black-diamond-question-mark symbol denotes a broken Unicode character.)

Continue reading

Optimizing Google AdSense

Posted on

When I implemented Google AdSense I noticed my pages loading slower. This was to be somewhat expected, but pages were stopping while each ad loaded. The sidebar was especially slow to appear.

I decided that the best solution would be to load the ads in the footer, but I was a little nervous about how best to achieve this. Luckily, I found a great tutorial on building an AdSense loader. I had to make a few tweaks to the jQuery, but it was a great starting point.

I explained in an earlier post how to display multiple AdSense ads in the loop using WP_Query properties. This builds on that functionality.

Continue reading

WordPress Tips: Post Count and Current Post

Posted on

Recently, I added Google AdSense to my site. I wanted to have an ad beneath the last post on each page, and another ad in the middle of pages with multiple posts. To do this I needed to figure out the total number of posts in the current loop. I had a hard time figuring out how to do this so I wanted to share what I learned.

The $post_count and $current_post properties of the WP_Query class provide the total number of posts in the query and the index of the current post respectively. By using these two values I was able to identify the middle post and the last post on every page.

Continue reading