java队列加多线程数据处理

转载

数据科学探索者 2024-09-30 18:25:49

文章标签 java队列加多线程数据处理 ide 线程池 sed 文章分类 Java 后端开发

第6章任务执行

在并发应用中，避免为每个任务都分配一个线程

线程生命周期的开销很高，在请求到达率很高的情况下将耗费大量计算资源影响性能
资源消耗大，可运行的线程数超过CPU数量后，必定会有线程被闲置等待CPU时间片，但是其仍然占用内存保存其状态，给GC带来压力。而且大量线程竞争CPU的时候额外的性能开销也不可忽视
稳定性，无限制的创建线程将难以避免服务器在高负载或遭到恶意攻击时崩溃，因此需要对应用程序可创建的线程数量进行限制，并进行全面的测试

Executor框架

java队列加多线程数据处理_线程池

java队列加多线程数据处理_ide_02

/**
 * An object that executes submitted {@link Runnable} tasks. This
 * interface provides a way of decoupling task submission from the
 * mechanics of how each task will be run, including details of thread
 * use, scheduling, etc.  An {@code Executor} is normally used
 * instead of explicitly creating threads. For example, rather than
 * invoking {@code new Thread(new(RunnableTask())).start()} for each
 * of a set of tasks, you might use:
 *
 * <pre>
 * Executor executor = <em>anExecutor</em>;
 * executor.execute(new RunnableTask1());
 * executor.execute(new RunnableTask2());
 * ...
 * </pre>
 *
 * However, the {@code Executor} interface does not strictly
 * require that execution be asynchronous. In the simplest case, an
 * executor can run the submitted task immediately in the caller's
 * thread:
 *
 *  <pre> {@code
 * class DirectExecutor implements Executor {
 *   public void execute(Runnable r) {
 *     r.run();
 *   }
 * }}</pre>
 *
 * More typically, tasks are executed in some thread other
 * than the caller's thread.  The executor below spawns a new thread
 * for each task.
 *
 *  <pre> {@code
 * class ThreadPerTaskExecutor implements Executor {
 *   public void execute(Runnable r) {
 *     new Thread(r).start();
 *   }
 * }}</pre>
 *
 * Many {@code Executor} implementations impose some sort of
 * limitation on how and when tasks are scheduled.  The executor below
 * serializes the submission of tasks to a second executor,
 * illustrating a composite executor.
 *
 *  <pre> {@code
 * class SerialExecutor implements Executor {
 *   final Queue<Runnable> tasks = new ArrayDeque<Runnable>();
 *   final Executor executor;
 *   Runnable active;
 *
 *   SerialExecutor(Executor executor) {
 *     this.executor = executor;
 *   }
 *
 *   public synchronized void execute(final Runnable r) {
 *     tasks.offer(new Runnable() {
 *       public void run() {
 *         try {
 *           r.run();
 *         } finally {
 *           scheduleNext();
 *         }
 *       }
 *     });
 *     if (active == null) {
 *       scheduleNext();
 *     }
 *   }
 *
 *   protected synchronized void scheduleNext() {
 *     if ((active = tasks.poll()) != null) {
 *       executor.execute(active);
 *     }
 *   }
 * }}</pre>
 *
 * The {@code Executor} implementations provided in this package
 * implement {@link ExecutorService}, which is a more extensive
 * interface.  The {@link ThreadPoolExecutor} class provides an
 * extensible thread pool implementation. The {@link Executors} class
 * provides convenient factory methods for these Executors.
 *
 * <p>Memory consistency effects: Actions in a thread prior to
 * submitting a {@code Runnable} object to an {@code Executor}
 * <a href="package-summary.html#MemoryVisibility"><i>happen-before</i></a>
 * its execution begins, perhaps in another thread.
 *
 * @since 1.5
 * @author Doug Lea
 */
public interface Executor {

    /**
     * Executes the given command at some time in the future.  The command
     * may execute in a new thread, in a pooled thread, or in the calling
     * thread, at the discretion of the {@code Executor} implementation.
     *
     * @param command the runnable task
     * @throws RejectedExecutionException if this task cannot be
     * accepted for execution
     * @throws NullPointerException if command is null
     */
    void execute(Runnable command);
}

Executor

上面展示了Executor接口的Java源代码，它提供了一种标准的方法将任务的提交过程与执行过程解耦合，并用Runable表示一个可执行的任务。

当你需要改变任务执行的方式，你只需要改变executor的实现，而无需影响本身提交+执行的骨干代码。

线程池

线程在创建和销毁过程中是有巨大开销的，线程池的设计思路就是重用已有线程，提高响应速度，并保证适当的线程数量使得处理器可以保持在忙碌状态，并且控制住线程竞争造成的内存消耗。

Executor提供了工厂方法来创建线程池：

newFixedThreadPool：固定长度的线程池

newCachedThreadPool：根据处理需求回收和增加线程数，没有规模限制

newSingleThreadPool：单一线程，可以保证任务的顺序（按优先级）执行

newScheduledThreadPool：固定长度的线程池，并且以定时或者延迟的方式来执行任务

ExecutorService

java队列加多线程数据处理_线程池

java队列加多线程数据处理_ide_02

package java.util.concurrent;
import java.util.List;
import java.util.Collection;

/**
 * An {@link Executor} that provides methods to manage termination and
 * methods that can produce a {@link Future} for tracking progress of
 * one or more asynchronous tasks.
 *
 * <p>An {@code ExecutorService} can be shut down, which will cause
 * it to reject new tasks.  Two different methods are provided for
 * shutting down an {@code ExecutorService}. The {@link #shutdown}
 * method will allow previously submitted tasks to execute before
 * terminating, while the {@link #shutdownNow} method prevents waiting
 * tasks from starting and attempts to stop currently executing tasks.
 * Upon termination, an executor has no tasks actively executing, no
 * tasks awaiting execution, and no new tasks can be submitted.  An
 * unused {@code ExecutorService} should be shut down to allow
 * reclamation of its resources.
 *
 * <p>Method {@code submit} extends base method {@link
 * Executor#execute(Runnable)} by creating and returning a {@link Future}
 * that can be used to cancel execution and/or wait for completion.
 * Methods {@code invokeAny} and {@code invokeAll} perform the most
 * commonly useful forms of bulk execution, executing a collection of
 * tasks and then waiting for at least one, or all, to
 * complete. (Class {@link ExecutorCompletionService} can be used to
 * write customized variants of these methods.)
 *
 * <p>The {@link Executors} class provides factory methods for the
 * executor services provided in this package.
 *
 * <h3>Usage Examples</h3>
 *
 * Here is a sketch of a network service in which threads in a thread
 * pool service incoming requests. It uses the preconfigured {@link
 * Executors#newFixedThreadPool} factory method:
 *
 *  <pre> {@code
 * class NetworkService implements Runnable {
 *   private final ServerSocket serverSocket;
 *   private final ExecutorService pool;
 *
 *   public NetworkService(int port, int poolSize)
 *       throws IOException {
 *     serverSocket = new ServerSocket(port);
 *     pool = Executors.newFixedThreadPool(poolSize);
 *   }
 *
 *   public void run() { // run the service
 *     try {
 *       for (;;) {
 *         pool.execute(new Handler(serverSocket.accept()));
 *       }
 *     } catch (IOException ex) {
 *       pool.shutdown();
 *     }
 *   }
 * }
 *
 * class Handler implements Runnable {
 *   private final Socket socket;
 *   Handler(Socket socket) { this.socket = socket; }
 *   public void run() {
 *     // read and service request on socket
 *   }
 * }}</pre>
 *
 * The following method shuts down an {@code ExecutorService} in two phases,
 * first by calling {@code shutdown} to reject incoming tasks, and then
 * calling {@code shutdownNow}, if necessary, to cancel any lingering tasks:
 *
 *  <pre> {@code
 * void shutdownAndAwaitTermination(ExecutorService pool) {
 *   pool.shutdown(); // Disable new tasks from being submitted
 *   try {
 *     // Wait a while for existing tasks to terminate
 *     if (!pool.awaitTermination(60, TimeUnit.SECONDS)) {
 *       pool.shutdownNow(); // Cancel currently executing tasks
 *       // Wait a while for tasks to respond to being cancelled
 *       if (!pool.awaitTermination(60, TimeUnit.SECONDS))
 *           System.err.println("Pool did not terminate");
 *     }
 *   } catch (InterruptedException ie) {
 *     // (Re-)Cancel if current thread also interrupted
 *     pool.shutdownNow();
 *     // Preserve interrupt status
 *     Thread.currentThread().interrupt();
 *   }
 * }}</pre>
 *
 * <p>Memory consistency effects: Actions in a thread prior to the
 * submission of a {@code Runnable} or {@code Callable} task to an
 * {@code ExecutorService}
 * <a href="package-summary.html#MemoryVisibility"><i>happen-before</i></a>
 * any actions taken by that task, which in turn <i>happen-before</i> the
 * result is retrieved via {@code Future.get()}.
 *
 * @since 1.5
 * @author Doug Lea
 */
public interface ExecutorService extends Executor {

    /**
     * Initiates an orderly shutdown in which previously submitted
     * tasks are executed, but no new tasks will be accepted.
     * Invocation has no additional effect if already shut down.
     *
     * <p>This method does not wait for previously submitted tasks to
     * complete execution.  Use {@link #awaitTermination awaitTermination}
     * to do that.
     *
     * @throws SecurityException if a security manager exists and
     *         shutting down this ExecutorService may manipulate
     *         threads that the caller is not permitted to modify
     *         because it does not hold {@link
     *         java.lang.RuntimePermission}{@code ("modifyThread")},
     *         or the security manager's {@code checkAccess} method
     *         denies access.
     */
    void shutdown();

    /**
     * Attempts to stop all actively executing tasks, halts the
     * processing of waiting tasks, and returns a list of the tasks
     * that were awaiting execution.
     *
     * <p>This method does not wait for actively executing tasks to
     * terminate.  Use {@link #awaitTermination awaitTermination} to
     * do that.
     *
     * <p>There are no guarantees beyond best-effort attempts to stop
     * processing actively executing tasks.  For example, typical
     * implementations will cancel via {@link Thread#interrupt}, so any
     * task that fails to respond to interrupts may never terminate.
     *
     * @return list of tasks that never commenced execution
     * @throws SecurityException if a security manager exists and
     *         shutting down this ExecutorService may manipulate
     *         threads that the caller is not permitted to modify
     *         because it does not hold {@link
     *         java.lang.RuntimePermission}{@code ("modifyThread")},
     *         or the security manager's {@code checkAccess} method
     *         denies access.
     */
    List<Runnable> shutdownNow();

    /**
     * Returns {@code true} if this executor has been shut down.
     *
     * @return {@code true} if this executor has been shut down
     */
    boolean isShutdown();

    /**
     * Returns {@code true} if all tasks have completed following shut down.
     * Note that {@code isTerminated} is never {@code true} unless
     * either {@code shutdown} or {@code shutdownNow} was called first.
     *
     * @return {@code true} if all tasks have completed following shut down
     */
    boolean isTerminated();

    /**
     * Blocks until all tasks have completed execution after a shutdown
     * request, or the timeout occurs, or the current thread is
     * interrupted, whichever happens first.
     *
     * @param timeout the maximum time to wait
     * @param unit the time unit of the timeout argument
     * @return {@code true} if this executor terminated and
     *         {@code false} if the timeout elapsed before termination
     * @throws InterruptedException if interrupted while waiting
     */
    boolean awaitTermination(long timeout, TimeUnit unit)
        throws InterruptedException;

    /**
     * Submits a value-returning task for execution and returns a
     * Future representing the pending results of the task. The
     * Future's {@code get} method will return the task's result upon
     * successful completion.
     *
     * <p>
     * If you would like to immediately block waiting
     * for a task, you can use constructions of the form
     * {@code result = exec.submit(aCallable).get();}
     *
     * <p>Note: The {@link Executors} class includes a set of methods
     * that can convert some other common closure-like objects,
     * for example, {@link java.security.PrivilegedAction} to
     * {@link Callable} form so they can be submitted.
     *
     * @param task the task to submit
     * @param <T> the type of the task's result
     * @return a Future representing pending completion of the task
     * @throws RejectedExecutionException if the task cannot be
     *         scheduled for execution
     * @throws NullPointerException if the task is null
     */
    <T> Future<T> submit(Callable<T> task);

    /**
     * Submits a Runnable task for execution and returns a Future
     * representing that task. The Future's {@code get} method will
     * return the given result upon successful completion.
     *
     * @param task the task to submit
     * @param result the result to return
     * @param <T> the type of the result
     * @return a Future representing pending completion of the task
     * @throws RejectedExecutionException if the task cannot be
     *         scheduled for execution
     * @throws NullPointerException if the task is null
     */
    <T> Future<T> submit(Runnable task, T result);

    /**
     * Submits a Runnable task for execution and returns a Future
     * representing that task. The Future's {@code get} method will
     * return {@code null} upon <em>successful</em> completion.
     *
     * @param task the task to submit
     * @return a Future representing pending completion of the task
     * @throws RejectedExecutionException if the task cannot be
     *         scheduled for execution
     * @throws NullPointerException if the task is null
     */
    Future<?> submit(Runnable task);

    /**
     * Executes the given tasks, returning a list of Futures holding
     * their status and results when all complete.
     * {@link Future#isDone} is {@code true} for each
     * element of the returned list.
     * Note that a <em>completed</em> task could have
     * terminated either normally or by throwing an exception.
     * The results of this method are undefined if the given
     * collection is modified while this operation is in progress.
     *
     * @param tasks the collection of tasks
     * @param <T> the type of the values returned from the tasks
     * @return a list of Futures representing the tasks, in the same
     *         sequential order as produced by the iterator for the
     *         given task list, each of which has completed
     * @throws InterruptedException if interrupted while waiting, in
     *         which case unfinished tasks are cancelled
     * @throws NullPointerException if tasks or any of its elements are {@code null}
     * @throws RejectedExecutionException if any task cannot be
     *         scheduled for execution
     */
    <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
        throws InterruptedException;

    /**
     * Executes the given tasks, returning a list of Futures holding
     * their status and results
     * when all complete or the timeout expires, whichever happens first.
     * {@link Future#isDone} is {@code true} for each
     * element of the returned list.
     * Upon return, tasks that have not completed are cancelled.
     * Note that a <em>completed</em> task could have
     * terminated either normally or by throwing an exception.
     * The results of this method are undefined if the given
     * collection is modified while this operation is in progress.
     *
     * @param tasks the collection of tasks
     * @param timeout the maximum time to wait
     * @param unit the time unit of the timeout argument
     * @param <T> the type of the values returned from the tasks
     * @return a list of Futures representing the tasks, in the same
     *         sequential order as produced by the iterator for the
     *         given task list. If the operation did not time out,
     *         each task will have completed. If it did time out, some
     *         of these tasks will not have completed.
     * @throws InterruptedException if interrupted while waiting, in
     *         which case unfinished tasks are cancelled
     * @throws NullPointerException if tasks, any of its elements, or
     *         unit are {@code null}
     * @throws RejectedExecutionException if any task cannot be scheduled
     *         for execution
     */
    <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks,
                                  long timeout, TimeUnit unit)
        throws InterruptedException;

    /**
     * Executes the given tasks, returning the result
     * of one that has completed successfully (i.e., without throwing
     * an exception), if any do. Upon normal or exceptional return,
     * tasks that have not completed are cancelled.
     * The results of this method are undefined if the given
     * collection is modified while this operation is in progress.
     *
     * @param tasks the collection of tasks
     * @param <T> the type of the values returned from the tasks
     * @return the result returned by one of the tasks
     * @throws InterruptedException if interrupted while waiting
     * @throws NullPointerException if tasks or any element task
     *         subject to execution is {@code null}
     * @throws IllegalArgumentException if tasks is empty
     * @throws ExecutionException if no task successfully completes
     * @throws RejectedExecutionException if tasks cannot be scheduled
     *         for execution
     */
    <T> T invokeAny(Collection<? extends Callable<T>> tasks)
        throws InterruptedException, ExecutionException;

    /**
     * Executes the given tasks, returning the result
     * of one that has completed successfully (i.e., without throwing
     * an exception), if any do before the given timeout elapses.
     * Upon normal or exceptional return, tasks that have not
     * completed are cancelled.
     * The results of this method are undefined if the given
     * collection is modified while this operation is in progress.
     *
     * @param tasks the collection of tasks
     * @param timeout the maximum time to wait
     * @param unit the time unit of the timeout argument
     * @param <T> the type of the values returned from the tasks
     * @return the result returned by one of the tasks
     * @throws InterruptedException if interrupted while waiting
     * @throws NullPointerException if tasks, or unit, or any element
     *         task subject to execution is {@code null}
     * @throws TimeoutException if the given timeout elapses before
     *         any task successfully completes
     * @throws ExecutionException if no task successfully completes
     * @throws RejectedExecutionException if tasks cannot be scheduled
     *         for execution
     */
    <T> T invokeAny(Collection<? extends Callable<T>> tasks,
                    long timeout, TimeUnit unit)
        throws InterruptedException, ExecutionException, TimeoutException;
}

ExxcutorService

继承了Executor接口，提供了对于任务执行生命周期的管理方法的接口。

共有3种状态：运行、关闭和已终止

shutdown方法：

启动平缓关闭过程
不再接受新任务
等待正在执行的任务完成
等在已提交但是未执行的任务完成
关闭Executor
该方法并不会阻塞等待shutdown过程完全结束，如果需要阻塞，需要额外调用awaitTermination方法

showdownNow方法：强制取消正在运行的任务并且不再开启新任务

不要使用Timer类，该类早已过期，Try ScheduledExecutorService和DelayQueue的组合

Callable与Future

想到一道面试题，如何用java加载一个大页面，包括文字图像，音频视频等。

Executor的submit方法返回的都是future对象，对于长时间任务，或者I/O开销大的任务，可以提交一个实现Callable接口的task，并在需要时才调用future对象的get方法查看执行结果。

这里书中提到：异构任务的并发，还是取决于慢的那个异构任务有多慢，真正能发挥并发特性的是，大量同构且相互独立的任务并发处理。（抢票啊，秒杀啊，巴拉巴拉）

CompletionServiceExecutor:

Executor+BlockingQueue, 其实现就是将callable对象在提交时包装在一个扩展的futuretask对象内，该对象复写了futuretask的done方法，在任务完成的时候会把future对象加入到一个blockingqueue中。使用方可以通过poll和take等队列操作获取队列中已经完成的future任务结果。

简单说一下poll和take，看源码注释，获取队头对象，并删除，poll不会阻塞，没有就返回null，take会，poll还有一个带timeout参数的方法。

TODO：ExecutorService实现类的源码解析

第7章取消与关闭

本章内容较为底层，可能是个人对于中断的理解有所欠缺，通读一遍下来不太有收获，之后准备再去复习一下“中断”的概念再来重看一遍。

任何代码都可能抛出一个RuntimeException。每当调用另一个方法时，都要对它的行为保持怀疑，不要盲目地认为它一定会正常返回，或者一定会抛出在方法声明中声明的某个已检查异常。对调用的代码越不熟悉，就越应该对其代码行为保持怀疑。

守护线程

在JVM启动时创建的所有线程中，除了主线程以外，其他都是守护线程，包括GC等。当创建一个新线程时，会继承创建者的守护状态，因此，主线程创建的线程都是普通线程。

“当一个线程退出时，JVM会检查其他正在运行的线程，如果这些线程都是守护线程，那么JVM会正常退出操作。” 这句话完全没有读懂，我感觉这一整章都翻译的不太好，mark一下之后去看一下英文原版是怎么描述的。

JVM停止时，所有守护线程将被直接抛弃，所以不要在守护线程里持有一些需要释放清理的资源。

本章最后说到避免使用finalize方法，这一点在JVM虚拟机那本书里也提到了，事实上现在大家都是这么做的--在finally中调用各种close，release，teardown方法进行收尾清理，关闭连接，归还文件句柄/套接字句柄等。

第8章线程池的使用

线程池的大小

想要正确地设置线程池的大小，必须分析计算环境、资源预算和任务的特性。

多少个CPU？
多大内存？
计算密集型还是 I/O密集型还是两者都有？
是否需要数据库连接？

对于计算密集型的任务，在拥有N_cpu个处理器的系统上，当线程池大小为N_cpu+1时，通常能实现最优的利用率。这个额外的线程可以确保，其他线程由于某些原因暂停时，可以利用到空出来的CPU时钟周期。

对于包含I/O操作或者其他阻塞操作的任务，由于线程不会一直执行，因此线程池的规模应该大一些。要正确的设置线程池的大小，需要估算出任务的等待时间与计算时间的比值。

一种比较简单的调节线程池大小的方式是：在某个基准负载下，分别设置不同大小的线程池来运行应用程序，观察CPU的利用率水平。

书上给出的公式：要使处理器达到期望的使用率，线程池的最优大小为：

　　N_threads=N_cpu*U_cpu*（1+W/C）

其中，U_cpu是期望的CPU利用率，W是等待时间，C是计算时间。

而对于内存等其他资源，可以通过以下方式计算线程池的大小约束：

　　先计算每个任务对该资源的需求量，然后用该资源的可用总量除以每个任务的需求量，得出的就是线程池大小的上限。

队列任务

当请求的到达速率超过了服务器的处理速率，请求就会堆积起来，ThreadPoolExecutor的应对措施是允许通过一个BlockingQueue来保存等待执行的任务。

其中newFixedThreadPool在默认情况下使用的一个无上界队列，LinkedBlockingQueue，如果所有工作者线程都处于忙碌状态，那么任务将在队列中等候，并且如果处理速度一直跟不上，队列将无限制的增加。

当然，跟稳妥的方式是使用有界队列，例如有界的LinkedBlockingQueue，PriorityBlockingQueue等，加上适当的饱和策略（队列满了之后如何处理后续来的请求）。

饱和策略

ThreadPoolExecutor的饱和策略可以通过调用setRejectedHandler来修改

AbortPolicy：抛出异常，让调用者自己处理
DiscardPolicy：悄悄的直接抛弃任务
DiscardOldest：抛弃队列头部的任务（或者优先级最高的），尝试提交新的任务。
Caller-runs：是一种调节机制，该策略不抛弃任务也不扔出异常，而是将任务退回给调用者的主线程执行，从而使得主线程不能accpet新任务，新任务将会堆积在TCP层面，如果TCP层满了，会自行抛弃请求，该策略实现了一种平缓的降低性能机制。

可以通过扩展ThreadPoolExecutor（继承），实现beforeExecute，afterExecute（如果是计算任务时间的话，before和after可以通过threadlocal变量来通信）和terminate等方法，来对任务的执行添加日志监控和统计信息收集。