How to Install Piwik to Analyse Acccess Logs on CentOS 6.5

0. I had to install piwik on top of nginx, php5.5 and mysql with innodb engine to improve performance. For people using piwik to analyse access logs this post might be informative, since piwik log analytic feature isn’t too fast…

1. Install nginx, mysql, and php.
Enable epel repo

Necessary packages except for php (very rough and not necesarily necessary)

yum install geoip geoip-devel pecl-geoip-php mysql-server mysql mysql-libs httpd-devel libpng-devel libgd-devel libjpeg-devel gcc curl-devel zlib=-devel libxml2-devel gd-2 apr-util-devel

Installing PHP from source to get the newest version. If any libraries are missing, install them via yum.

wget the newest php and unzip and move in to the directory, and run command below
./configure –with-config-file-path=/etc –with-config-file-scan-dir=/etc/php.d –with-apxs2 –with-libdir=lib64 –with-zlib –enable-calendar –enable-mbstring –with-mcrypt –with-mysql –with-mysqli –with-iconv –with-pear –enable-fpm –with-pdo-mysql –with-gd –enable-gd-native-ttf –enable-gd-jis-conv –with-vpx-dir=/usr –with-jpeg-dir=/usr –with-xpm-dir=/usr –with-freetype-dir=/usr –with-vpx-dir=/usr –with-jpeg-dir=/usr –with-xpm-dir=/usr –with-curl

make clean
make
make install

Copy php.ini to /etc/
Edit php-fpm.conf and change user and group to nginx from nobody.
And the do…
pecl install geoip
cp /php-5.5.9/sapi/fpm/init.d.php-fpm /etc/init.d/php-fpm
service php-fpm start
Increase the memory_limit to 8192M in php.ini, otherwise piwik cannot process monthly and yearly report.
If you don’t have 8G RAM like me, you need to create swap.

2. Configure mysql
Append below to /etc/my.cnf

default-table-type = InnoDB
key_buffer = 16M
max_allowed_packet = 16M
thread_stack = 256K
thread_cache_size = 16
innodb_buffer_pool_size = 2048M
innodb_log_file_size = 512M
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 2
sync_binlog = 0
innodb_flush_method = O_DIRECT
transaction-isolation = READ-COMMITTED

And rm -rf /var/lib/i*
service mysqld start

Create db and user for piwik

mysql_secure_intallation
mysql -u -root -p
create database piwik_db;
create user ‘username’@’localhost’ identified by ‘your_password’;
grant all on piwik_db.* to ‘username’@’localhost’;
grant file on *.* to ‘username’@’localhost’;
flush privileges;
quit;

3. Configure ningx
Edit /etc/nginx/nginx.conf and change number of worker processes to according to your CPU

worker_processes 4;

Edit /etc/nginx/conf.d/default.conf to enable php. I changed index and root of php-fpm. Here is a extract

location / {
root /usr/share/nginx/html;
index index.php;
}

location ~ \.php$ {
root /usr/share/nginx/html;
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /usr/share/nginx/html$fastcgi_script_name;
include fastcgi_params;
}

Start nginx.

service start nginx

4. Install Piwik (finally!)
Get a zip file of your prefered version from the official site and unzip to /usr/share/nginx/html
I download the newest version from http://builds.piwik.org/

cd /usr/share/nginx/html
chown nginx:nginx -R piwik
chmod 0755 -R piwik

You might need to redirect piwik/tmp/cache/tracker to /dev/shm when analysis is slow.

For those who need region and city reports,
Get GeoIP.dat and GeoCityIP.dat from
http://dev.maxmind.com/geoip/legacy/geolite/
Download them to /usr/share/nginx/piwik//html/misc as GeoIP.dat and GeoCityIP.dat and change owner to nginx.
And in /etc/php.ini add below
geoip.custom_directory=/usr/share/nginx/piwik/html/misc

Add below to /etc/nginx/fastcgi.conf.
fastcgi_param GEOIP_ADDR $remote_addr;
fastcgi_param GEOIP_COUNTRY_CODE $geoip_country_code;
fastcgi_param GEOIP_COUNTRY_NAME $geoip_country_name;
fastcgi_param GEOIP_REGION $geoip_region;
fastcgi_param GEOIP_REGION_NAME $geoip_region_name;
fastcgi_param GEOIP_CITY $geoip_city;
fastcgi_param GEOIP_AREA_CODE $geoip_area_code;
fastcgi_param GEOIP_LATITUDE $geoip_latitude;
fastcgi_param GEOIP_LONGITUDE $geoip_longitude;
fastcgi_param GEOIP_POSTAL_CODE $geoip_postal_code;

Add below to html section in /etc/nginx/nginx.conf
geoip_country /usr/share/nginx/html/misc/GeoIP.dat
geoip_city /usr/share/nginx/html/misc/GeoCityIP.dat

5. Secure your server if you are paranoid like me.

iptable -F
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
iptables -A INPUT -s “IP ADDRESS(/ to sepecify subnet) you want to allow” -j DROP
iptables -A OUTPUT -p tcp -d “IP ADDRESS(/ to sepecify subnet) you want to allow” -j DROP
/sbin/service iptables save

6. Access to the server via browser(http://127.0.0.1/piwik) and configure piwik.
You should disable browser triggered achiving if you only use piwik to analyse access logs.
Change GEOIP setting to your preference, I use Maxmind geoip.
I also add below to my config.php.ini.

enable_processing_unique_visitors_day = 1
enable_processing_unique_visitors_week = 1
enable_processing_unique_visitors_month = 1
enable_processing_unique_visitors_year = 1
enable_processing_unique_visitors_range = 1
browser_archiving_disabled_enforce = 1
session_save_handler = dbtable
enable_marketplace = 0
enable_auto_update = 0

7. Start analysing your access logs! Change recoders number to the core number of your cpu.

cd /var/logs/httpd/
for logs in `ls -1` ; do
piwik/misc/log-analytics/import_logs.py –url=http://127.0.0.1/piwik $logs –idsite=1 –recorders=4 –enable-http-errors –enable-http-redirects –enable-static –enable-bots –enable-reverse-dns –exclude-path=”127.0.0.1″ ;
done

8. Archiving analysed datas. You need to run this after analysis everytime.

/usr/share/bin/php /usr/share/nginx/html/piwik/misc/cron/archive.php –url=http://127.0.0.1/piwik/ > pwk_archive.log

6GB RAM was occupied while monthly datas were being processed in my environment…

That’s all! With this configuration, it took about 20 minutes to analyse an accesslog of 680000 lines on 4 core, 4GB RAM virtual machine.
It seems as I increase the number of cpu core, the processing speed increases proportionally, and unbelievably post processing(archiving) requires
as much RAM as the amount of the datas being processed.
It’s the slowest among all analytic softwares out there, and I’m still looking for ways to make it faster.
Feel free to leave a comment if you know a way to impove piwik performance better, or anything.