SolrCloud的实施架构图
当一个文档被发送到一台主机进行索引的时候,系统会先确定当前主机是replica还是leader。
1)如果当前节点是replica,文档将会转发给leader进行处理
2)如果当前节点是leader,SolrCloud会确定该文档应该在哪个shard上面进行处理,并且把文档发送给指定shard的leader节点,leader节点收到请求后会处理该文档,并且把索引数据发送给自己和全部的replica节点。
一、zookeeper安装
cloud05 192.168.2.35 zookeeper
cloud06 192.168.2.36 zookeeper
cloud06 192.168.2.37 zookeeper
具体的按照步骤参考 《Zookeeper的安装》
二、solr4.7 安装
solr1 192.168.2.35
solr1 192.168.2.36
solr1 192.168.2.37
(1)下载solr4.7
http://apache.dataguru.cn/lucene/solr/4.7.2/
(2)创建solrhome
mkdir -p /home/hadoop/app/solrcloud/solrhome
(3)解压
tar-zxvf solr-4.7.2.tar.gz
(4)复制solr.war到solr1
cpsolr.war /home/hadoop/app/solrcloud/solr1/webapps/
(5)solr-4.7.2/example/solr 目录下的collection1目录和solr.xml、zoo.cfg到solrhome目录下
cdsolr-4.7.2/example/solr
cp-R ./* /home/hadoop/app/solrcloud/solrhome
(5)拷贝solr启动的依赖文件
* 复制example/lib/ext目录中的jar包到项目的classpath下,可以将这些jar包放到%TOMCAT_HOME%/lib下,
也可以将它们放到项目的lib下(在我的电脑上是/webapps/solr/WEB-INF/lib);
* example/resources/log4j.properties也拷到classpath(我在webapps/solr/目录下新建了一个classes目录,
放log4j.properties放了进去);
(6)配置环境变量
vibin/catalina.sh
export SOLR\_HOME=/home/hadoop/app/solrcloud
export JAVA\_OPTS="$JAVA\_OPTS -server -Xmx1024m -Xms512m -Dsolr.solr.home=$SOLR\_HOME/solrhome/"
export PATH=$PATH:$JAVA\_HOME/binexport CLASSPATH=$JAVA\_HOME/lib
export CATALINA\_HOME=$SOLR\_HOME/solr1
export CATALINA\_BASE=$SOLR\_HOME/solr1
(7)启动服务器
(8)验证服务器
http://192.168.2.35:8080/solr
三、配置solr集群
vibin/catalina.sh
第一台机器192.168.2.35_solr1):
自动创建Collection及初始Shard,不需要通过zookeeper手动上传配置文件并关联collection。
export SOLR_HOME=/home/hadoop/app/solrcloud
JAVA_OPTS="-Djetty.port=8080 -Dbootstrap_confdir=$SOLR_HOME/solrhome/collection1/conf -Dcollection.configName=myconf -DzkHost=192.168.2.35:2181,192.168.2.36:2181,192.16
8.2.37:2181 -DnumShards=2"
export JAVA_OPTS="$JAVA_OPTS -server -Xmx1024m -Xms512m -Dsolr.solr.home=$SOLR_HOME/solrhome/"
export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=$JAVA_HOME/lib
export CATALINA_HOME=$SOLR_HOME/solr1
export CATALINA_BASE=$SOLR_HOME/solr1
这个步骤上传了集群的相关配置信息(conf)到ZooKeeper中去,所以启动下一个节点时不用再指定配置文件了。
另:对于/home/hadoop/app/solrcloud/solrhome/solr.xml 中修改jetty.port为对应的tomcat服务器的端口号(其他solr实例同样):
<solr>
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">${jetty.port:8080}</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
</solr>
第二台机器(192.168.2.36_solr1):
JAVA_OPTS="-Djetty.port=8080 -DzkHost=192.168.2.35:2181,192.168.2.36:2181,192.168.2.37:2181 -DnumShards=2"
这样就会创建2个shard分别分布在2个节点上,如果你在增加一个节点,这节点会附加到一个shard上成为一个replica,而不会创建新的shard。
第三台机器(192.168.2.37_solr1):
JAVA_OPTS="-Djetty.port=8080 -DzkHost=192.168.2.35:2181,192.168.2.36:2181,192.168.2.37:2181 -DnumShards=2"
四、Java程序调用API来访问solrcloud集群
(1)项目的结构
(2)获取solrcloud实例的代码
package com.solr.common;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.impl.CloudSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrInputDocument;
public class CloudServer {
private static Log logger = LogFactory.getLog(CloudServer.class);
public static final int zkClientTimeout = 20000;
public static final int zkConnectTimeout = 20000;
private static CloudSolrServer server;
/**
* 获取CloudServer实例
*
* @param zkHost
* @param collection
* @return
*/
public static synchronized CloudSolrServer getInstance(final String zkHost, String collection) {
if (null == server) {
try {
logger.info("The Cloud SolrServer Instance has benn created!");
server = new CloudSolrServer(zkHost);
server.setDefaultCollection(collection);
server.setZkClientTimeout(zkClientTimeout);
server.setZkConnectTimeout(zkConnectTimeout);
server.connect();
logger.info("The cloud Server has been connected !!!!");
} catch (Exception e) {
logger.error("The cloud Server has been errored !!!!", e);
}
}
return server;
}
public static void shutdown() {
server.shutdown();
}
/**
* 添加索引
*
* @param server
*/
public static void add(CloudSolrServer server) {
SolrInputDocument doc = null;
List<SolrInputDocument> docs = new ArrayList<SolrInputDocument>();
try {
long startTime = System.currentTimeMillis();
int len =4;
for (int i = 0; i < len; i++) {
doc = new SolrInputDocument();
doc.addField("id", "2000"+i);
doc.addField("title", "shenfl" + i);
docs.add(doc);
if (i != 0 && i % 5000 == 0) {
logger.info("create index total count:" + i);
server.add(docs);
server.commit();
docs.clear();
}
}
logger.info("docs.size=>" + docs.size());
if (docs.size() > 0) {
server.add(docs);
server.commit();
}
logger.info("the cloud add index time=>" + (System.currentTimeMillis() - startTime) / 1000 + "s");
} catch (Exception e) {
logger.error("the cloud add is exception!!!", e);
e.printStackTrace();
}
}
/**
* 查询数据
*
* @param solr
* @param query
* 查询条件
*/
public static SolrDocumentList query(CloudSolrServer solr, String query) {
SolrQuery params = new SolrQuery();
SolrDocumentList docs = null;
try {
params.setQuery(query);
params.setStart(0);
// 默认为10
params.setRows(20);
QueryResponse response = solr.query(params);
docs = response.getResults();
logger.info("Query Time=>" + response.getQTime() + "ms");
} catch (Exception e) {
logger.error("The cloud Server query failure !!!!", e);
e.printStackTrace();
}
return docs;
}
}
(3)JAVA语言通过API访问ZooKeeper集群
package com.solr.common;
import java.util.Iterator;
import java.util.Set;
import org.apache.solr.client.solrj.impl.CloudSolrServer;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.cloud.ClusterState;
import org.apache.solr.common.cloud.ZkStateReader;
import org.junit.Test;
/**
* <p>
* 测试solrcloud集群
* </p>
* @author shenfl
*
*/
public class CloudServerTest {
static CloudSolrServer solr = null;
static {
//默认端口号2181,故zkHost也可以直接写192.168.2.35
//这里测试: 让192.168.2.35 对应的zk 断开,测试集群,发现ok,说明solrcloud的API直接支持集群,zk保证不少于一半即可访问
//Could not connect to ZooKeeper 192.168.2.35:2181 within 20000 ms
String zkHost = "192.168.2.35:2181,192.168.2.36,192.168.2.37";
String defaultCollection = "collection1";
//获取示例,每个zk使用逗号分割,已经封装支持多zk的集群
solr = CloudServer.getInstance(zkHost, defaultCollection);
}
/**
* 测试在zookeeper分布式服务连接情况
*/
@Test
public void testConnectCloudServer() {
// 获取连接zookeeper状态
ZkStateReader zkStateReader = solr.getZkStateReader();
ClusterState clusterState = zkStateReader.getClusterState();
System.out.println("clusterState=>" + clusterState + "\n");
//获取ZooKeeper服务上存活的节点
Set<String> liveNodes = clusterState.getLiveNodes();
for(String value:liveNodes){
System.out.println("liveNode=>" + value);
}
//获取ZooKeeper服务商所有cluster
Set<String> collections = clusterState.getCollections();
Iterator<String> iterator = collections.iterator();
while(iterator.hasNext()){
String value = iterator.next();
System.out.println("collection=>"+ value);
}
//获取cluster的
Integer zkClusterStateVersion = clusterState.getZkClusterStateVersion();
System.out.println("zkClusterStateVersion=>" + zkClusterStateVersion);
}
/**
* 通过zookeeper服务器向solr服务器上添加索引
*/
@Test
public void testAdd() {
// 添加Index
CloudServer.add(solr);
}
@Test
public void testQuery() {
SolrDocumentList docs = CloudServer.query(solr, "*:*");
// 添加Index
System.out.println("NumFound=>" + docs.getNumFound());
String name = null;
String id = null;
for (SolrDocument doc : docs) {
name = doc.getFieldValues("title").toString();
id = doc.getFieldValue("id").toString();
System.out.println("name:" + name + ",id:" + id );
}
}
}
(4)对查询索引的测试结果
4、 1服务器上的内容;
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">514</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">*:*</str>
<str name="_">1420202618950</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="8" start="0" maxScore="1.0">
<doc>
<str name="id">10000</str>
<arr name="title">
<str>50元话费</str>
</arr>
<long name="_version_">1489161523793231872</long></doc>
<doc>
<str name="id">10001</str>
<arr name="title">
<str>100元话费</str>
</arr>
<long name="_version_">1489161540326129664</long></doc>
<doc>
<str name="id">1111</str>
<arr name="title">
<str>xxxxx</str>
</arr>
<long name="_version_">1489170482206867456</long></doc>
<doc>
<str name="id">111111</str>
<arr name="title">
<str>change.me</str>
</arr>
<long name="_version_">1489170520961187840</long></doc>
<doc>
<str name="id">20000</str>
<arr name="title">
<str>shenfl0</str>
</arr>
<long name="_version_">1489173288229797888</long></doc>
<doc>
<str name="id">20001</str>
<arr name="title">
<str>shenfl1</str>
</arr>
<long name="_version_">1489173288241332224</long></doc>
<doc>
<str name="id">20002</str>
<arr name="title">
<str>shenfl2</str>
</arr>
<long name="_version_">1489173288242380800</long></doc>
<doc>
<str name="id">20003</str>
<arr name="title">
<str>shenfl3</str>
</arr>
<long name="_version_">1489173288242380801</long></doc>
</result>
</response>
4、 2程序获取的结果;
NumFound=>8
name:[50元话费],id:10000
name:[100元话费],id:10001
name:[xxxxx],id:1111
name:[change.me],id:111111
name:[shenfl0],id:20000
name:[shenfl1],id:20001
name:[shenfl2],id:20002
name:[shenfl3],id:20003