Linux Sysadmin Blog

Install Apache Solr and Tomcat for Drupal

- | Comments

Here’s my quick install guide for Solr, Tomcat, and Drupal ApacheSolr module for multiple sites. Mostly I based the steps below from the following sites: wiki.apache.org and drupalconnect.com.

Detailed Setup:

  • Drupal 6.19
  • ApacheSolr module 6-1.1
  • Apache Solr PHP Client Library: Rev.22
  • Solr 1.4.1
  • Tomcat 6.0.29
  • SunJDK 6update21
  • RHEL5.5x64

Install Process: Tomcat

  • Create solr user
  • Download Tomcat6
  • Extract to /opt/tomcatthis will be the $CATALINA_HOME directory, you can use any dir you want
  • Edit /opt/tomcat/conf/tomcat-users.xml to enable Tomcat login.  See comments in this file.
1
2
3
<role rolename="manager"/>
<role rolename="admin"/>
<user username="tomcat" password="tomcat" roles="manager,admin"/>
  • Test run your Tomcat:  /opt/tomcat/bin/catalina.sh run. Chown all Tomcat files to solr user (chown -R solr.solr /opt/tomcat). Default server setting will use port 8080, to customized edit the file /opt/tomcat/conf/server.xml. If you encounter error on ”BASEDIR environment variable is not defined correctly…”, check permissions of .sh files inside /opt/tomcat/bin/ and make them executable (chmod 755 /opt/tomcat/bin/*.sh).
  • Add startup (init) script. Copy this Tomcat6 init file from Apache.org to /etc/init.d/tomcat6.  Check and update variables like Java home, Tomcat directory, etc, if needed. Add to startup /sbin/chkconfig --add tomcat6 and /sbin/chkconfig tomcat6 on. Dependencies: redhat-lsb (or lsb-base?)
  • Visit your Tomcat Admin page. ex http://localhost:8080

Install Process: Solr

  • Download Solr
  • Extract to temporary location, ex: /opt/apache-solr-1.4.1
  • Copy /opt/apache-solr-1.4.1/dist/apache-solr-1.4.1.war to /opt/tomcat/webapps/solr.war
  • Copy /opt/apache-solr-1.4.1/example/solr directory to /opt/tomcat/solr  this will be the $SOLR_HOME directory, you can use any dir you want
  • Create file /opt/tomcat/conf/Catalina/localhost/solr.xml with the following configuration.  Make sure paths are correct.
1
2
3
<Context docBase="/opt/tomcat/webapps/solr.war" debug="0" privileged="true" allowLinking="true" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/opt/tomcat/solr" override="true" />
</Context>

Install Process: ApacheSolr Drupal module and SolrPHP client

Configure Solr for Multi-Core Setup

  • Copy /var/www/site1/sites/all/modules/apachesolr/schema.xml to /opt/tomcat/solr/conf/schema.xml
  • Copy /var/www/site1/sites/all/modules/apachesolr/solrconfig.xml to /opt/tomcat/solr/conf/solrconfig.xml
  • Copy /opt/apache-solr-1.4.1/example/multicore/solr.xml to /opt/tomcat/solr/solr.xml
  • Create directory for each site and copy /opt/tomcat/solr/conf directory to each of them. Example:
1
2
3
4
mkdir /opt/tomcat/solr/site1
mkdir /opt/tomcat/solr/site2
cp -r /opt/tomcat/solr/conf /opt/tomcat/solr/site1/
cp -r /opt/tomcat/solr/conf /opt/tomcat/solr/site2/
  • Edit /opt/tomcat/solr/solr.xml with the following config:
1
2
3
4
5
6
7
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="false">
<cores adminPath="/admin/cores">
<core name="site1" instanceDir="site1" />
<core name="site1" instanceDir="site2" />
</cores>
</solr>
  • Start or Restart Tomcat: /etc/init.d/tomcat6 start
  • Visit http://localhost:8080/ and go to you Solr App

Configure Drupal site:

  • Go to ApacheSolr settings http://localhost/admin/settings/apachesolr
  • Save your config and if all is good you’ll see message: Your site has contacted the Apache Solr server.
1
2
3
Solr host name: localhost
Solr port: 8080
Solr path (for site1): /solr/site1
  • Configure your search index.

WordPress 3 Error: Briefly Unavailable for Scheduled Maintenance. Check Back in a Minute.

- | Comments

You’ll get this error when Wordpress automatic update process, via svn or admin, fail or is incomplete. It leaves the file named ”.maintenance” on your home or root directory, with info on maintenance.

Sample content of .maintenance file: <?php $upgrading = 1282258195; ?>

Just delete or rename that file and resume your update process, or you may want to restore your backup first and re-start the update process. Of course, check what cause the failed or incomplete update. :)

Upload or Download Multiple Files (Recursive) Using Command-line FTP

- | Comments

Problem is I can’t upload directory including sub-directories and files using command-line ftp. I searched for similar problem and looks like you can’t do this using command line ftp, unless you create a script. One solution I found is the use of ”lftp”.

# lftp ftp_host
> user ftp_user ftp_pass
> mirror source target      (download entire directory tree)
> mirror -R source target  (reverse mirror; upload entire directory tree)

Usefull when transferring files between servers where you only have ftp access.

Remove Duplicate Packages in CentOS

- | Comments

I got a package dependency issue when updating our 64-bit CentOS server, which was caused by two perl packages installed (i386 and x86_64).

1
2
perl i386     4:5.8.8-32.el5_5.1     installed     28 M
perl     x86_64   4:5.8.8-32.el5_5.1     installed     34 M

I tried to remove it using rpm command but didn’t work (maybe i just don’t know the correct params with rpm). My solution to remove package was using yum ”remove package_name.architecture”. Ex: yum remove perl.i386

Any other shortcuts in deleting duplicate packages?

Surge2010 After Party - Three Martini Chug

- | Comments

I don’t know the name of this gentelman but he was THE hit at the after hour part.  I believe this was the second or third time he did the trick at the insistance of the other sysadmins in the audience.  It should be noted that there were no more than three women in the crowd.

A Day in the Life of Facebook Operations

- | Comments

Notes from the “A Day in the Life of Facebook Operations” presentation by Tom Cook, Systems Engineer, Facebook at Surge2010 conference.

So far this is the most attended session.  Standing room only available only before it start.

What does facebook sysadmins have to support?

  • Monthly 700 million minutes of time spent on fb

  • 6billion pieces of content updated

  • 3 billion photos

  • 1 million connect implementations

  • 1/2 billion active users

Infrastructure Growth

  • fb reached a limit on leasing datacenter space

  • fb is building their own http://www.facebook.com/prinevilledatacenter

  • currently serving out of california and Virginia

Initially a LAMP stack.  LB -> Web Servers -> Services/Memcached/Databases

Originally facebook was a simple Apache PHP site.  When fb started hitting a limit on this, they started compiling PHP into C++ (HipHop for PHP).

FB claims to be the biggest memcache deployment in the world.  They server 300 Terbytes of memcached data out of memory.

MySQL improvements contributed back is flashcache.

Services supported

  • News Feed

  • Search

  • Cache

Service implementation languages

  • C++

  • PHP - front end

  • python

  • Ruby

  • Java

  • erlang (chat room)

How do they talk between these?  Json?  SOAP?  No, fb implemented Thrift - ligtwaith software framework for cross language development, a common glue behind all facebook systems.

For Systems, what does fb have to worry about on a daily basis?

  • deployment

  • monitoring

  • data manaement

  • Core operating updates

Facebook OS is…. CentOS!

Systems Management

  • Configuration Management

  • CFengine for system management

  • On Demand

Deployments

  • Web Push - new code gets deployed to fb at least once a day.  Its a coordinated push, everyone is aware, notification happens to dev team.  Everyone sites on IRC during the push.  It is undestood by engineers and the rest of the company

    • push software built over on-demand control tools

    • code distributed via internal BitTorrent swarm

    • php gets compiled, the few hundred MB binary gets rapidly pushed bia bit torrent.

    • it takes one minute to push across the entire network

  • Backend Deployments - only Engineering and Operations.  Engineers write, test and display

    • Quickly make performance decisions

    • Expose changes to subset of real traffic

    • No ‘commit and quit’

    • Deeply involved in moving services to production

    • Ops ‘embeded’ into engineering teams

  • Heavy Change logging - pin pointing code to every push and change

Monitoring and Metrics of servers and performance at facebook

  • Ganglia - aggregated metrics

    • fast

    • straightforward

    • nested grids & pools

    • over 5 million monitored metrics

  • facebook inhouse monitoring system

Monitoring - facebook still uses Nagios!

To manage complexity and the number of alarms and systems monitoring the fb team uses aggregation.  Initially alarms were managed by email.

Scribe - high performance logging application.  Initially used syslog.  Also used Hadoop and Hive.

How does it work and gets done?

  • clear delineation of dependencies and responsibilities

  • Constant Failure

  • Servers were the first line of defense, then started focusing on racks

  • Now is focused on clusters.  Logical delineation based on function (web, db, feed, etc)

  • Next stage is datacenters - what to do if a natural disaster strikes?

  • Constant Communication - information is shared constantly.

    • IRC

    • lots of automated bots, get and set data

    • internal news updates

    • “Headers” on internal tools

    • Change log/feeds

  • Small teams

Interesting fact - each fb server gets an update on average every eight minutes.

Busiest day for FB is day after halloween :)

URLS to check out:

facebook.com/engineering

facebook.com/opensource

Keynote Summary of Surge Conference

- | Comments

Where can the very young industry of Web Ops learn from? Keynote - John Allspaw from Surge Conference.

Interesting  which arent about the web but relate to Systems Enge

  • Complex Systems Fail in Funny Ways - Accidents, Living with High-Risk Technologies

  • An Engineers view of Human Error - Human Error comes in different flavors

  • RAD Lab

Why do systems engineers have it easy?

  • No one dies when the site goes down

  • If we make a mistake we can recover quickly

  • Future is bright

John concludes in these calls to actions:

  • Look around!  NASA, go to non web conferences that relate, for example, systems safety

  • Engage Academia

  • It’s time for our field to get more sophisticated

The Planet and Softlayer to Merge

- | Comments

Looks like consolidation in the hosting space continues as Softlayer is going to be under the same ownership as “The Planet”.

For those who are not aware, GI also owns a large stake in The Planet, and I now have the pleasure of telling you that we are in discussions to merge The Planet with SoftLayer. The goal is for the transaction to be complete in the fourth quarter of this year.

I know you have a lot of questions. To be perfectly honest, we are at the beginning of this and don’t have all the answers. Teams at both companies have just begun discussing how to integrate our organizations, and their work will yield the details we don’t yet know.