11、solr学习-SolrCloud的安装

SolrCloud的实施架构图

 

当一个文档被发送到一台主机进行索引的时候,系统会先确定当前主机是replica还是leader。

1)如果当前节点是replica,文档将会转发给leader进行处理

2)如果当前节点是leader,SolrCloud会确定该文档应该在哪个shard上面进行处理,并且把文档发送给指定shard的leader节点,leader节点收到请求后会处理该文档,并且把索引数据发送给自己和全部的replica节点。

一、zookeeper安装

cloud05 192.168.2.35 zookeeper

cloud06 192.168.2.36 zookeeper

cloud06 192.168.2.37 zookeeper

具体的按照步骤参考 《Zookeeper的安装》

二、solr4.7 安装

solr1 192.168.2.35  
solr1 192.168.2.36  
solr1 192.168.2.37

(1)下载solr4.7
http://apache.dataguru.cn/lucene/solr/4.7.2/

(2)创建solrhome
mkdir -p /home/hadoop/app/solrcloud/solrhome

(3)解压
tar-zxvf solr-4.7.2.tar.gz

(4)复制solr.war到solr1
cpsolr.war /home/hadoop/app/solrcloud/solr1/webapps/
(5)solr-4.7.2/example/solr 目录下的collection1目录和solr.xml、zoo.cfg到solrhome目录下
cdsolr-4.7.2/example/solr
cp-R ./* /home/hadoop/app/solrcloud/solrhome
(5)拷贝solr启动的依赖文件
* 复制example/lib/ext目录中的jar包到项目的classpath下,可以将这些jar包放到%TOMCAT_HOME%/lib下,
也可以将它们放到项目的lib下(在我的电脑上是/webapps/solr/WEB-INF/lib);
* example/resources/log4j.properties也拷到classpath(我在webapps/solr/目录下新建了一个classes目录,
放log4j.properties放了进去);
(6)配置环境变量
vibin/catalina.sh

export SOLR\_HOME=/home/hadoop/app/solrcloud  
export JAVA\_OPTS="$JAVA\_OPTS -server -Xmx1024m -Xms512m -Dsolr.solr.home=$SOLR\_HOME/solrhome/"  
export PATH=$PATH:$JAVA\_HOME/binexport CLASSPATH=$JAVA\_HOME/lib  
export CATALINA\_HOME=$SOLR\_HOME/solr1  
export CATALINA\_BASE=$SOLR\_HOME/solr1  

(7)启动服务器
(8)验证服务器
http://192.168.2.35:8080/solr

三、配置solr集群
vibin/catalina.sh
第一台机器192.168.2.35_solr1):
自动创建Collection及初始Shard,不需要通过zookeeper手动上传配置文件并关联collection。

export SOLR_HOME=/home/hadoop/app/solrcloud
JAVA_OPTS="-Djetty.port=8080 -Dbootstrap_confdir=$SOLR_HOME/solrhome/collection1/conf -Dcollection.configName=myconf -DzkHost=192.168.2.35:2181,192.168.2.36:2181,192.16
8.2.37:2181  -DnumShards=2"
export JAVA_OPTS="$JAVA_OPTS -server -Xmx1024m -Xms512m -Dsolr.solr.home=$SOLR_HOME/solrhome/"
export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=$JAVA_HOME/lib
export CATALINA_HOME=$SOLR_HOME/solr1
export CATALINA_BASE=$SOLR_HOME/solr1

这个步骤上传了集群的相关配置信息(conf)到ZooKeeper中去,所以启动下一个节点时不用再指定配置文件了。
另:对于/home/hadoop/app/solrcloud/solrhome/solr.xml 中修改jetty.port为对应的tomcat服务器的端口号(其他solr实例同样):

<solr>
  <solrcloud>
    <str name="host">${host:}</str>
    <int name="hostPort">${jetty.port:8080}</int>
    <str name="hostContext">${hostContext:solr}</str>
    <int name="zkClientTimeout">${zkClientTimeout:30000}</int>
    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
  </solrcloud>
  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:0}</int>
    <int name="connTimeout">${connTimeout:0}</int>
  </shardHandlerFactory>
</solr>

第二台机器(192.168.2.36_solr1):
JAVA_OPTS="-Djetty.port=8080 -DzkHost=192.168.2.35:2181,192.168.2.36:2181,192.168.2.37:2181 -DnumShards=2"
这样就会创建2个shard分别分布在2个节点上,如果你在增加一个节点,这节点会附加到一个shard上成为一个replica,而不会创建新的shard。
第三台机器(192.168.2.37_solr1):
JAVA_OPTS="-Djetty.port=8080 -DzkHost=192.168.2.35:2181,192.168.2.36:2181,192.168.2.37:2181 -DnumShards=2"

四、Java程序调用API来访问solrcloud集群

(1)项目的结构

&nbsp;

(2)获取solrcloud实例的代码

package com.solr.common;

import java.util.ArrayList;
import java.util.List;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CloudSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrInputDocument;

public class CloudServer {

    private static Log logger = LogFactory.getLog(CloudServer.class);
    public static final int zkClientTimeout = 20000;
    public static final int zkConnectTimeout = 20000;

    private static CloudSolrServer server;

    /**
     * 获取CloudServer实例
     * 
     * @param zkHost
     * @param collection
     * @return
     */
    public static synchronized CloudSolrServer getInstance(final String zkHost, String collection) {

        if (null == server) {
            try {
                logger.info("The Cloud SolrServer Instance has benn created!");
                server = new CloudSolrServer(zkHost);
                server.setDefaultCollection(collection);
                server.setZkClientTimeout(zkClientTimeout);
                server.setZkConnectTimeout(zkConnectTimeout);
                server.connect();
                logger.info("The cloud Server has been connected !!!!");
            } catch (Exception e) {
                logger.error("The cloud Server has been errored !!!!", e);
            }
        }
        return server;
    }

    public static void shutdown() {
        server.shutdown();
    }

    /**
     * 添加索引
     * 
     * @param server
     */
    public static void add(CloudSolrServer server) {
        SolrInputDocument doc = null;
        List<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
        try {
            long startTime = System.currentTimeMillis();
            int len =4;
            for (int i = 0; i < len; i++) {
                doc = new SolrInputDocument();
                doc.addField("id", "2000"+i);
                doc.addField("title", "shenfl" + i);
                docs.add(doc);
                if (i != 0 && i % 5000 == 0) {
                    logger.info("create index total count:" + i);
                    server.add(docs);
                    server.commit();
                    docs.clear();
                }
            }
            logger.info("docs.size=>" + docs.size());
            if (docs.size() > 0) {
                server.add(docs);
                server.commit();
            }
            logger.info("the cloud add index time=>" + (System.currentTimeMillis() - startTime) / 1000 + "s");
        } catch (Exception e) {
            logger.error("the cloud add is exception!!!", e);
            e.printStackTrace();
        }
    }
    /**
     * 查询数据
     * 
     * @param solr
     * @param query
     *            查询条件
     */
    public static SolrDocumentList query(CloudSolrServer solr, String query) {

        SolrQuery params = new SolrQuery();
        SolrDocumentList docs = null;
        try {
            params.setQuery(query);
            params.setStart(0);
            // 默认为10
            params.setRows(20);
            QueryResponse response = solr.query(params);
            docs = response.getResults();
            logger.info("Query Time=>" + response.getQTime() + "ms");
        } catch (Exception e) {
            logger.error("The cloud Server query failure !!!!", e);
            e.printStackTrace();
        }
        return docs;
    }
}

(3)JAVA语言通过API访问ZooKeeper集群

package com.solr.common;

import java.util.Iterator;
import java.util.Set;

import org.apache.solr.client.solrj.impl.CloudSolrServer;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.cloud.ClusterState;
import org.apache.solr.common.cloud.ZkStateReader;
import org.junit.Test;
/**
 * <p>
 * 测试solrcloud集群
 * </p>
 * @author shenfl
 *
 */
public class CloudServerTest {
    static CloudSolrServer solr = null;
    static {
        //默认端口号2181,故zkHost也可以直接写192.168.2.35
        //这里测试: 让192.168.2.35 对应的zk 断开,测试集群,发现ok,说明solrcloud的API直接支持集群,zk保证不少于一半即可访问
        //Could not connect to ZooKeeper 192.168.2.35:2181 within 20000 ms
        String zkHost = "192.168.2.35:2181,192.168.2.36,192.168.2.37";
        String defaultCollection = "collection1";
        //获取示例,每个zk使用逗号分割,已经封装支持多zk的集群
        solr = CloudServer.getInstance(zkHost, defaultCollection);
    }
    /**
     * 测试在zookeeper分布式服务连接情况
     */
    @Test
    public void testConnectCloudServer() {

        // 获取连接zookeeper状态
        ZkStateReader zkStateReader = solr.getZkStateReader();
        ClusterState clusterState = zkStateReader.getClusterState();
        System.out.println("clusterState=>" + clusterState + "\n");

        //获取ZooKeeper服务上存活的节点
        Set<String> liveNodes = clusterState.getLiveNodes();
        for(String value:liveNodes){
            System.out.println("liveNode=>" + value);
        }

        //获取ZooKeeper服务商所有cluster
        Set<String> collections = clusterState.getCollections();
        Iterator<String> iterator = collections.iterator();
        while(iterator.hasNext()){
            String value = iterator.next();
            System.out.println("collection=>"+ value);
        }

        //获取cluster的
        Integer zkClusterStateVersion = clusterState.getZkClusterStateVersion();
        System.out.println("zkClusterStateVersion=>" + zkClusterStateVersion);

    }
    /**
     * 通过zookeeper服务器向solr服务器上添加索引
     */
    @Test
    public void testAdd() {
        // 添加Index
        CloudServer.add(solr);
    }
    @Test
    public void testQuery() {
        SolrDocumentList docs = CloudServer.query(solr, "*:*");
        // 添加Index
        System.out.println("NumFound=>" + docs.getNumFound());
        String name = null;
        String id = null;
        for (SolrDocument doc : docs) {
            name = doc.getFieldValues("title").toString();
            id = doc.getFieldValue("id").toString();
            System.out.println("name:" + name + ",id:" + id );
        }
    }
}

(4)对查询索引的测试结果

4、 1服务器上的内容;

<response>

<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">514</int>
  <lst name="params">
    <str name="indent">true</str>
    <str name="q">*:*</str>
    <str name="_">1420202618950</str>
    <str name="wt">xml</str>
  </lst>
</lst>
<result name="response" numFound="8" start="0" maxScore="1.0">
  <doc>
    <str name="id">10000</str>
    <arr name="title">
      <str>50元话费</str>
    </arr>
    <long name="_version_">1489161523793231872</long></doc>
  <doc>
    <str name="id">10001</str>
    <arr name="title">
      <str>100元话费</str>
    </arr>
    <long name="_version_">1489161540326129664</long></doc>
  <doc>
    <str name="id">1111</str>
    <arr name="title">
      <str>xxxxx</str>
    </arr>
    <long name="_version_">1489170482206867456</long></doc>
  <doc>
    <str name="id">111111</str>
    <arr name="title">
      <str>change.me</str>
    </arr>
    <long name="_version_">1489170520961187840</long></doc>
  <doc>
    <str name="id">20000</str>
    <arr name="title">
      <str>shenfl0</str>
    </arr>
    <long name="_version_">1489173288229797888</long></doc>
  <doc>
    <str name="id">20001</str>
    <arr name="title">
      <str>shenfl1</str>
    </arr>
    <long name="_version_">1489173288241332224</long></doc>
  <doc>
    <str name="id">20002</str>
    <arr name="title">
      <str>shenfl2</str>
    </arr>
    <long name="_version_">1489173288242380800</long></doc>
  <doc>
    <str name="id">20003</str>
    <arr name="title">
      <str>shenfl3</str>
    </arr>
    <long name="_version_">1489173288242380801</long></doc>
</result>
</response>

4、 2程序获取的结果;

NumFound=>8
name:[50元话费],id:10000
name:[100元话费],id:10001
name:[xxxxx],id:1111
name:[change.me],id:111111
name:[shenfl0],id:20000
name:[shenfl1],id:20001
name:[shenfl2],id:20002
name:[shenfl3],id:20003