Проблемы при чтении данных с помощью искрового коннектора cassandra в Spark java API

Я новичок в Apache Spark и хочу подключить искру к базе данных Cassandra.

  • Искра версия: 2.2.0
  • Версия Кассандры: 2.1.14

Ошибка происходит в нижней строке

(long count = javaFunctions(sc).cassandraTable("test", "table1").count();)

Мой код Java ниже:

public SparkTestPanel(String id, User user) {
    super(id);
    form = new Form("form");
    form.setOutputMarkupId(true);
    this.add(form);
    SparkConf conf = new SparkConf(true);
    conf.setAppName("Spark Test");
    conf.setMaster("local[*]");
    conf.set("spark.cassandra.connection.host", "127.0.0.1");
    conf.set("spark.cassandra.auth.username", "username");
    conf.set("spark.cassandra.auth.password", "password");
    JavaSparkContext sc=null;
    try {
        sc = new JavaSparkContext(conf);
        long count = javaFunctions(sc).cassandraTable("test", "table1").count();

    } finally {
        sc.stop();
    }
}

И мой pom.xml таков:

 <!-- https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector-java_2.10 -->
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector-java_2.11</artifactId>
        <version>1.5.2</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.6.5</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>com.datastax.spark</groupId>
        <artifactId>spark-cassandra-connector_2.11</artifactId>
        <version>2.0.3</version>
    </dependency>
    <dependency>
        <groupId>com.datastax.cassandra</groupId>
        <artifactId>cassandra-driver-core</artifactId>
        <version>3.2.0</version>
        <type>jar</type>
    </dependency>

И я получаю эту ошибку:

[Stage 0:=======================>                                  (4 + 4) / 10]
[Stage 0:>                                                         (0 + 4) / 10]
[Stage 0:=====>                                                    (1 + 4) / 10]
[Stage 0:=================>                                        (3 + 4) / 10]
2017-08-16 12:22:48,857 ERROR Executor:91 - Exception in task 6.0 in stage 0.0 (TID 6)
java.lang.AbstractMethodError: com.datastax.spark.connector.japi.GenericJavaRowReaderFactory$JavaRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object;
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$15.apply(CassandraTableScanRDD.scala:345)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$15.apply(CassandraTableScanRDD.scala:345)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:397)
at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1801)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1158)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2062)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
2017-08-16 12:22:48,861 ERROR TaskSetManager:70 - Task 6 in stage 0.0 failed 1 times; aborting job

Кто-нибудь может помочь?


person Mohamadreza Rostami    schedule 16.08.2017    source источник
comment
указанная версия требует java 8... проверьте версию java   -  person undefined_variable    schedule 16.08.2017
comment
моя версия Java - версия Java 1.8.0_131. @Неопределенная переменная   -  person Mohamadreza Rostami    schedule 16.08.2017


Ответы (1)


Удалите зависимости spark-cassandra-connector-java_2.11 и cassandra-driver-core. Ваш pom.xml должен быть таким, как показано ниже.

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.2.0</version>
</dependency>
<dependency>
    <groupId>com.datastax.spark</groupId>
    <artifactId>spark-cassandra-connector_2.11</artifactId>
    <version>2.0.3</version>
</dependency>
person abaghel    schedule 16.08.2017