Build My Academic Paper Feedback Network

02 June 2017

I sketch through each top-level storage conferences and try to build a framework to catch world storage technology updates and to quickly filter through large volume of papers and to select good ones. There are other sources though, like industrial summits, and opensource project commits/blueprints, leader company moves, etc. This article will focus on papers.

Why read papers

Generally, papers are good sources to understand technology in depth.

The introduciton parts and related works parts are usually the best resource to grasp a new domain. There are papers that are very informative doing the introduction.
The analytical skills, where how the authors conduct the improvements and root cause digging, can be borrowed.
The design points and experiences can be applied into more places.
The evaluation part can be learned as how to evaluate systematically with quality.

Essentially, a peek to the top human minds is the most interesting.

Research and industry

Some research works may need 5+ years (or even much more) to be wide-adopted into industry, like the erasure-coding, LSM-tree, Paxos, etc. Those who did the opening work and be the first to bring them into industry are remarkable. They may be the secret source to build a industry leading product. To select the really valuable research works (some may be rarely know) and apply to industrial system development (you need a lot of analysis, decision making, and adoption changes) is the top skill.

Some research papers took much less time. They can be quickly learned to industry. There are papers proposing various improvements to existing technology. Maybe 1-2 years they will be adopted in mang places.

Sometime research works follow the industry breakthrough, like MapReduce, scale-out storage, etc. They help to propose new improvements, experience, evaluations, which we can learn form. Research works may also co-walk with industry, like deeplearning, GPU computing, etc.

Some research works are directly brought into industry. Usually, there is some college folk who did remarkable research, like Ceph, and they build startup company with it. These works may eventually affect industry trends.

There are papers written to expose industry system designs, like BigTable, Cassandra, etc. They are remarkable learning sources, and will usually start new projects improved from them.

Besides the above, papers are pool of sources to learn problem analyzing skills, to update with information and knowledges, and to share and borrow a lot of technology improvements.

How to fast read

Papers are structured for fast reading.

Everything important in the paper will definitely be illustrated in the abstract. And, if the paper achieves good results in evaluation, it will add that.
Besides the informative background, contemporary work, motivations, challenges, the introduction/background/overview part will usually illustrate the main designs and key improvements/contributions of this paper; they can even be seen as a shorter version of this paper.
By common practice, for every paragraph, the first sentence clearly summarizes what this paragraph is going to say; following that are examples, detailed designs, and discussions on the details.
Abstract, introduction, related works, evaluation, conclusion, are the five key parts of the paper for fast read. They can define what place the paper stands.
Generally, most paper is based on one improvement/finding -> write a new protoype system to illustrate it -> digging the deeper layer of problems, analysis more -> add some secondary improvements/fixes. You can catch its head. Some other papers are whole system design; they will talk about every aspects of the system; So usually with more information.
The most helpful one is, actually, after you read 100+ papers in specific domain, fast reading is natually easy. You almost know what it is going to say.

Essentially, it is not the volume of words, but the volume of new information that defines the speed of read.

How to select good papers

Even only in top-level storage conferences, there are too many papers each year. Here I share some experience to select good ones to read. Only appliable to storage.

The (nominated) best paper award, and invited paper, in each top-level conferences, are usually of high value. For example, Paragon is nominated for Best Paper Award at ASPLOS, selected as Invited Paper in the ACM Transactions on Computer Systems (TOCS), and selected in IEEE Micro’s Top Picks for 2013. Here’s a link to the collection of all best papers.
Good papers have quick growing reference count in 1-2 years. 10 refs in the first year can be a sign of good paper. No matter time, if a paper has 1000+ or 2000+ refs, it is a breakthrough paper, very worth reading. If a paper has 200+ to 300+ refs, it is usually a big improvement paper. 100+ means the paper is influencing. You can also use Google Scholar to search, which ranks by reference count and recentness.
See the paper’s abstract. Good paper usually show significant performance improvements. Better if it is evaluated solidly on real production workloads. And the important thing is, every true highlights should be listed in abstract. So you should find what you need. Also, papers done by industrial leading companies on prduction systems, like Google, Facebook, Microsoft, etc, are usually good.
Dig into the paper’s reference list. Usually, only papers will be notable achievements will be referenced by others. Famous paper’s reference list is also worth digging. And you may also find some famous authors and their trail of publishing paper, like the beginning of log-structed FS.
If the paper’s author starts a startup or opensource project with it, that usually means it is a really good paper. For example, Ceph, Paragon & Quasar, RAMCloud.
Some paper has extensive background introduction, they are very helpful to understand a new technology domain in depth. These papers may even illustrate the history of how the technology evolves through each breakthrough and the corresponding papers, very helpful.
Good storage papers may resides in multiple conferences, not only just storage. For example, OSDI is named as operation system architecture, but it is also a top place for storage papers. NSDI is named as networking conference, but it also has good storage papers because distributed storage is a network system. Also, top-level conferences usually features in the amount of work, and depth of analysis, or good performance results; but not necessarily the smartest ideas, while ATC does have some smart but simpler papers.
You may even hunt on college curriculums. They help grow solid understanding. For example, from where Ceph’s author Sage graduated.

Others

After read a paper, search who references to it. Those referencers may have valuable/inspiring/insightful comments on the paper.

Storage top-level conferences

Here I list the top-level conferences related to storage. (The “others” part is not top-level, but ATC is worth reading.)

Architecture： ISCA, HPCA, ASPLOS
Storage: FAST, MSST
Operation Systems： SOSP, OSDI
Networking： NSDI, SIGCOMM
Database: SIGMOD, VLDB
Others: ATC, ACM TOCS, ACM TOS, hotCloud

Each of them probably publish ~50 papers each year.

Ranking good papers

I want to index the recent year papers and rank them by reference counts, so that I can find out which are good papers and whose reference counts are quickly growing.

In the end I found Google Scholar is the handy tool. Here I list each conference, their home page of year 2016, and google scholar search links for their papers. In the search results page, the top ones are usually good papers.

Besides, many conferences put their video on YouTube, e.g. USENIX. By ranking view count I can find which paper attracts more people. This ranking responses quicker than papar reference count. To find out the search keyword, click a paper’s “Cite” button on Google Scholar.

FAST: top-level storage conference; favor in filesystem, reliability, SSD, kvstore papers; lack distributed architecture design (I guess they are supposed to goto OSDI/SOSP)
- Home 2016
- Searches 2016: part-1, part-2
- Searches 2018: 16th USENIX Conference on File and Storage Technologies “16th USENIX”
- Searches 2019: By Google search By YouTube view count
MSST: top-level, more industry oriented; also includes panels and talks; Ceph once occupied the headline.
USENIX ATC: much more papers; contains smart ideas; Copyset paper was published in it
OSDI: top-level, storage architecture, paxos, scheduling, big data, OS components, etc. A place to publish new distributed storage system architectures. Google likes to publish here, e.g. Bigtable, MapReduce, Spanner.
- Home 2016
- Searches 2016: part-1, part-2
- Searches 2018: Proceedings of the 13th USENIX Symposium on Operating Systems
- Searches 2019
SOSP: top-level, storage architecture, paxos, big data, OS components, etc. A place to publish new distributed storage system architectures, e.g. Google File System
- Home 2015 (per 2 years)
- Searches 2015
ASPLOS: top-level, storage architecture, scheduling, OS, etc. The Quasar & Paragon scheduler was published here. Most paper published into sigarch, sigops, sigplan.
- Home 2016
- Searches 2016: asplos part-1, asplos part-2, sigarch, sigops, sigplan
- Searches 2019
ISCA: top-level, computer architecture, cache, resource utilization, etc. Google Heracles was published in it.
- Home 2016
- Searches 2016
HPCA: high-performance computing.
- Home 2016
- Searches 2016
NSDI: top-level network, also include some distributed storage systems. The ZLog CORFU paper was published here.
SIGCOMM: top-level network, include big player papers such as Google, Facebook. DCTCP, Jupiter Rising, were published here.
- Home 2016
- Searches 2016
SIGMOD/PODS: top-level database
- Home 2016: SIGMOD / PODS
- Searches 2016: SIGMOD, PODS
VLDB: top-level database
- Home 2016: research track, industrial track
- Searches 2016
hotCloud: hot topics in cloud computing
- Home 2016
- Searches 2016
ACM TOCS: ACM transactions on Compute Systems. the old and classic place; The beginning of log structured filessytem, “The design and implementation of a log-structured file system”, was published here; also the Bigtable paper.
- Home
- Searches 2016
ACM TOS: ACM transactions on Storage. Archival journal that deals with storage.
- Home
- Searches 2016

More importantly, for all the best papers in the above conferences

The collection of all best papers

More conveniently, the paper presentation video click count is a nice ranking.

Youtube FAST23 video list and click count

Tour across conferences

To grasp how the world storage technology evolves, I sketch through papers in each conferences at year 2016 and extracts what are everyone talking about.

MSST 2016

    1. the panels, talks, invited tracks, are very helpful.
    2. compared to FAST, MSST is more close to industry, and FAST is more academic (but also have Google, Microsoft industrial papers)
    3. the topics covered: flash, dedup, archival,
                           Storage Performance Enhancements
                           File Systems for Non-Volatile Memory
                           Store More, Longer, and for Less: Deduplication and Archival Systems
                           Spotlight on Flash memory and Solid-State Drives
                           Understanding Storage Systems through Measurements and Analysis
                           On-the-Go Storage

ATC 2016

there is a section, "Best of the Rest", that lists all the Best Paper in other conferences, e.g. FAST, NDSI, SOSP, Usenix Security, etc.
ATC gives Best Student Paper. A lot of ATC papers are smart.
ATC covers almost every hot topics in storage: KV Store, Security, Cloud, Consensus, Caching, Indexing, Network, Big Data, OS.

OSDI 2016

    1. topics: filesystem crash verification, compiler, libraries on multi-core, OS context & processes;
               cloud scheduling, including the CARBYNE, Firmament which were once recommended, all sort of;
               storage transaction, replicate protocol, in-memory, RDM, RPC;
               networking NFV, reachability analysis, datacenter networking, disaggregated interconnect
               graph processing, Tensorflow, RDF graphs
               software engineering, programming languages, GC, code re-randomization (security), JVM,
               EC Cache, evolution of the multiprocessor software architecture, geo-distributed analytics, Dynamic information flow tracking (DIFT)
               Paxos, Consensus, crash fault-tolerance, state-machine replication (SMR), concurrency control
               security, sandbox, private/secure communication, analytics over encrypted data
               Troubleshooting, performance profiling, config error detection, live traffic tests
               formal verification, certified OS, per-application library OS, container security, Intel SGX, huge page
               reactive data management service, general purpose sharding, co-locate & resource utilization, data quality
    2. reading the abstracts of each paper is a good way to know the domains. also watch for the best papers

SOSP 2015

    1. topics: Formal Systems, Crash Hoare Logic, provably correct distributed systems
               Distributed transactions, in-memory, RDMA, RPC exactly-once semantics, RAMCloud
               Distributed Systems, Paxos, private messaging,
               Concurrency and Performance, MapReduce, read-log-update, synchronization mechanism, performance Profiling
               Energy Aware Systems, mobile, tablet, Software Defined Batteries, Power Management
               More Distributed Transactions, scalable SQL, replication protocol, consistency, ACID concurrency control,
               Experience and Practice, anomalies in consistency, CPU validation,
               Bugs and Analysis, Root Cause Diagnosis, filesystem semantic bugs,
               Big Data, memory pressure, programming model, graph processing, mining,
               Storage Systems, filesystem directory cache, maintenance (backup, layout optimization, etc), Split-Level I/O Scheduling
    2. SOSP and OSDI are both 2-year each, OSDI in even years, SOSP in odd years. They provide high quality storage papers.

ASPLOS 2016

    1. topics: Multicore, on-chip wireless communication, market-based chip shared resource allocation, performance-management runtime QoS
               IO, sidecores, network DMA interface, crash-consistency models
               Memory Management, memory allocator,  Cache Management, tail latency prediction & correction
               Debugging, static bug checking, Detecting data races, formal verify high-assurance file-system
                          Causality inference, Non-Deterministic Concurrency Bugs, Reference counts bugs
               Heterogeneous Architectures and Accelerators, OS design for Heterogeneous Manycores, Energy Efficient, offloading to a low-power processor, Manycore, verifying memory ordering
                                                             FPGAs, kernel-based data parallel programming models, interference, PCI-e bandwidth contention, GPU, Behavioral Specialized Accelerators,
               Security, IOMMU Protection, encrypt NVM, Verified Untrusted System Services, SGX, trusted
                         Information-Flow Tracking, Program State Relocation, “rowhammer” attack,
               Code Generation and Synthesis, code optimizer, Assembly, compilers, binary analysis, Code variants
               Energy and Thermal Management, Power management, Energy-autonomous, sprinting architecture
               Emerging Memory Technologies, NVRAM Write-Ahead Logging, Transactions on NVRAM, image encoding on storage cell, Persistent Memory Logging
               Cloud Computing, Interference Management, language runtime, Resource-Efficient Provisioning, Workflow Monitoring
               OS Optimizations, Kernel TCP Design, Short-Lived Connections, Virtual Address Spaces, heterogeneous memories
               Non-traditional Computer Systems, non-Von Neumann architecture, Pattern-Recognition Processors, controlling approximation, Approximate Computing, DNA-based archival storage
               Transactional Memory, recovery from unexpected permanent processor faults, Lock-Free Multicore Synchronization, breaks the serialization of hardware queues
    2. ASPLOS is somewhat more close to hardware (processors, cores, interfaces, IO, memory) and thus involves more disciplines, as said in its description
       ASPLOS also has more papers about debugging/languages/compilers, and OS low levels that interfacing to hardware
    3. ASPLOS have many invited talks/speeches besides the papers, also the
       "Synopsis of the ASPLOS ’16 Wild and Crazy Ideas (WACI) Invited-Speakers Session"

ISCA 2016

    1. topics: Neural Networks, DNN, Processing-in-memory (PIM), NN acceleration
                                DNN compression, mobile vision, image sensor, CNN, accelerator, Low-Power
                                energy consumption, minimal data movement, dataflow, High-Density 3D Memory, neuromorphic architecture, domain-specific ISA
               Heterogeneous Architecture / Approximate Computing, Work stealing, Work-mugging, object deserialization , co-processor, Approximate Acceleration
               Caches, Cache Replacement, TLB, virtual caching, LLC energy-efficient,
               Hardware Design, Reconfigurable Hardware, FPGA, neural networks, RTL designs, On-core microarchitectural, HW/SW co-designed
               Accelerators, Near-Data Processing, big data, Data-intensive, Graph Analytics Accelerators, ASIC accelerators, Bitcoin mining ASIC Clouds
               GPUs, Cache Efficiency, Transparent Offloading, Near-Data Processing in GPU, 3D-stacked memory, memory-intensive, multiprogramming GPUs,
                     Locality Aware thread block (TB) Scheduler, Address Translation on GPUs, Thread-Level Parallelism,
               NoC / Virtualization, NoC-based CMP, VM Interpreters, ARM Virtualization,
               Cache / Memory Compression, cache compression vs replacement, Compression in Many-core,
               Reliability, high reliability memory systems, On-Die ECC, Production-Run Software Failure Diagnosis, neural hardware
                            end-to-end ECC, on-chip ECC, error pattern transformation, memory reliability, memory faults, Memory Repair,
               Microarchitecture, ISA extension, data-level parallelism (DLP), SIMD, Simultaneous multithreading (SMT) out-of-order cores, enhanced Memory Controller,
               Datacenters, Tail Latency, Precise Load Testing, Power Management, Scheduling, Energy Proportional Servers, Power Virus, power attack defense
               Memory, FPGA, DRAM-Based Reconfigurable Acceleration Fabric, DRAM subarrays, Lifetime in Resistive Memories, PCRAM and ReRAM Wear leveling / wear limiting, Memory Inter-arrival Time Traffic Shaping
                       Phase Change Memory (PCM),  re-constructing data, Nested paging, shadow paging, virtual machine monitor (VMM), DRAM data bus energy-efficiency, encoding
               Emerging Architectures, Approximate computing, Markov Chain Monte Carlo (MCMC) sampling, Molecular Optical Gibbs Sampling Units, Analog Accelerator
               Energy-efficient Computing, Resource Efficiency, MIMO (multiple input, multiple output) controller, Ultra-Low-Power processors, Sub-core Configurable Architecture
    2. ISCA talks a lot about cache, memory, accelerator, GPU, software/hardware co-design architecture changes, also neural networks,

HPCA 2016

    1. topics: Hardware Accelerators, Boltzmann machine, deep learning, FPGA, resistive RAM (RRAM), memory-centric, compute in memory, machine learning, generate accelerators, general-purpose programmability, domain-specific
               Mobile/IoT, mobile CPU design, smartphone, energy control, QoS, Software Defined Radio (SDR)
               NVM, non-intrusive memory controller, Compression-expansion coding, reconfigurable architecture, resistive RAM, Access-transistor-free memristive crossbars
               Reconfigurable Architectures, FPGA, OpenCL, Near-Data Processing, coarse-grain reconfigurable architectures (CGRA), dynamic binary translation,
               GPUs, Voltage noise, manufacturing process variation,  core tunneling, GPU pre-execution, Warps, Compression,
               Cache, cache placement, Virtual caches, cache synonyms, Modeling Cache Performance, LRU, cache tag management, Tagless DRAM Caches (TDCs),
               Coherence and Consistency, Sequential Consistency Violations, Contention for shared memory, true sharing / false sharing, hardware transactional memory (HTM),
               Interconnects, network on chip (NoC), dynamic voltage/frequency scaling (DVFS), chip multiprocessors (CMPs), Photonic interconnects, laser gating technique, power efficiency
               GPGPUs, page memory, Simultaneous Multikernel GPU, dynamic sharing, warp scheduling,
               Security, timing-channel protection, secure memory scheduling, key recovery timing attack on a GPU, side-channel vulnerability on GPU, last-level cache side channel attacks,
               Large-Scale Systems, NUMA, core allocation, Power oversubscription, power capping, Power Surges, Fuel cells power source, datacenters, energy storage devices,
               Potpourri, mathematical computing architecture, power efficiency, Hardware prefetching, memory page migration, asymmetric regions memory architecture
               Industry Session, Cache coherence between CPUs and GPUs, consolidated server racks, datacenter server architectures, Soft-Errors on GPUs, mobile storage architecture,
               Memory Technology, significant variations and degraded timings, restore cell data, die-stacked DRAMs, 3D DRAM, memory faults, bulk data movement in DRAM, DRAM latency
               Best of IEEE Computer Architecture Letters, Associative Processor (AP), resistive memory, Stochastic and Deterministic Computing, heterogeneous architectures, specialized hardware, Heterogeneous Power, power mismatching,
               Modeling and Testing, heterogeneous multicore processors, thermal estimation, Microarchitecture, simulation, memory consistency model (MCM),
               Caches and TLB, Address Translation, Energy-Efficient, LLCs, dead-block management, Cache QoS, Cache Monitoring Technology (CMT), Cache Allocation Technology (CAT),
               Microarchitecture, Simultaneous multithreading (SMT) processors, IBM POWER8, symbiotic job scheduling, Voltage Scalability, energy efficiency, sharing physical register,
    2. HPCA talks a lot about high-end / accelerating hardwares, and in-depth. there are accelerators, NVM, Resistive RAM, GPUs, caches (hardware), interconnects, large NUMA, processsors. there are also many evaluation papers on new approaches/technology.

NSDI 2016

    1. topics: Network Architectures and Protocols, Software-Defined Internet Exchange Points (SDXes), Rack-scale computers, rack-scale network, SDN,  load balancing, blockchain, bitcoin, mobile cellular devices, cellular traffic,
               Content Delivery, user delays, media delivery, cryptographic, private information retrieval (PIR), page load latency, Dependency Tracking, video Quality of Experience (QoE),
               Wireless I, low power, hardware, Localization, indoor positioning, AP, uplink, human blockage,
               Flexible Networks, Measuring the flow of traffic, traffic engineering, SDN optimization, Middlebox, NFV, outsource network processing to the cloud, middlebox outsourcing,
               Dependability and Monitoring, Checking whether a network correctly implements intended policies, minimized bug executions, netflow, Internet routes, Network forensics and incident response,
               Resource Sharing, predict performance, Web Memory Cache allocation, fair allocation of memory cachem, Resource Fairness (DRF), isolation,
               Distributed Systems, Consensus, atomic broadcast, FPGA, stream processing, assignments, Social Networks, file slicing API, zero-copy, Storage-Performance Tradeoff,
               In-Network Processing, packet scheduling, Least Slack Time First (LSTF), Explicit Congestion Notification (ECN), load balancer, ECMP, Middlebox, inspect packet payloads,
               Security and Privacy, Delegations, Community Repositories, Anonymous Reputation, Tracking-Resistant, Tor, latency-based congestion control, expose user data to web services, Mobile, access control,
               Wireless II, Cellular Network,  data center networks (DCN), 3D Ring Reflection Spaces (RRSs), Physical Vibration, vibratory radio, privacy threat, sensor obfuscation technique,
    2. Although NSDI is named as network, it does provide good storage system papers. There are also SDN, web sites (and CDN, latency, memory cache), mobile (and wireless), protocols, middleboxes, packet processing, etc.

SIGCOMM 2016

    1. topics: SDN & NFV I, reconfigurable hardware, packet processing, FPGA, data-plane algorithms, OpenFlow, line rate,
               Wide Area Networks, high-available, network infrastructure, optical WAN, bulk transfer, inter-datacenter transfer, traffic engineering,
               Monitoring and Diagnostics, network flow monitoring, "one-big-switch" abstraction, event monitoring, differential provenance,
               Scheduling, fair queuing, multi-tenant, coflows, mix-flows with/without deadlines, bandwidth allocation control,
               Datacenters I, Datacenter Time Protocol (DTP), network managment, RDMA, flow-control mechanism, datacenter interconnects,
               Verification, control plane analysis, network static analysis, symbolic execution, BGP,
               Networked Applications, page load time, end-to-end latency, video quality-of-experience (QoE), Telephony Call Quality,
               Wireless, inter-technology backscatter, FPGA, ultra-low power, on-body sensor, energy budget, power-proportional, distributed MIMO, cellular network,
               Datacenters II, congestion control, virtualized, DCTCP, vSwitch, root cause analysis,
               Censorship and Choice, ISP, traffic policing, policing and pacing and shaping, net neutrality, L2 STP and L3, convergence,
               SDN & NFV II, network-wide deployment, network functions (NF), software switch, P4, OVS, OpenFlow,
               Best of CCR, transparency, privacy, social interactions, human rights,
    2. SIGCOMM talks about packet processing, datacenter networking, WAN, BGP, wireless, SDN/NFV. not as close to distributed storage systems as NSDI does.

SIGMOD 2016

    1. topics: (too many. only picking 3 papers each)
               Scalable Analytics and Machine Learning, join, machine learning (ML), batch gradient descent, video recommendation,
               Privacy and Security, social graph, Differential privacy,
               Logical and Physical Database Design, Data Warehouse, NoSQL, Oracle, JSON data management, Couchbase,
               New Storage and Network Architectures, OLTP and OLAP, Vectorization, JIT, flash translation layer (FTL), PVB, Flash,
               Graphs 1: Infrastructure and Processing on Modern Hardware, Breadth-First Search (BFS), GPU, joint traversal, Iterative Analysis, Relational,
               Streaming 1: Systems and Outlier Detection, Complex Event Processing (CEP), shared patterns, incremental View Maintenance, outlier detection,
               Approximate Query Processing, relational algebra (RA), bounded RA queries, join, complex ad-hoc queries,
               Networks and the Web, sampling, Viral Marketing, social network, continuous influence maximization problem,
               Data Discovery and Extraction, metadata, datasets, entity resolution project, Functional dependencies,
               Data Integration / Cleaning, Integrity constraints, repair data, data cleaning,
               Spatio / Temporal Databases, spatial and temporal, mining, temporal aggregation, Many-Many Relationships
               Distributed Data Processing, Realtime, Spark, R, SQL-on-Hadoop,
               Graphs 2: Subgraph-based Optimization Techniques, Subgraph querying, Graph Indexing, Subgraph Matching,
               Main Memory Analytics, Multi-Column Sorting, pipelining, Columnar Access, HANA, In-memory columnar databases,
               Interactive Analytics, OLAP, incremental query processing, Prefetching,
               Streaming 2: Sketches, Adaptiveness, sampling, Accurate,
               Transaction Processing, B+-tree, In-Memory Indexing, CPU-GPU, Checkpointing, Main-Memory Database, Deterministic database systems,
               Transactions and Consistency, Weak Consistency, Optimistic Concurrency Control, OLTP, Multicore, contention,
               Query Optimization, Adaptive techniques, cost-based optimizer, Sampling-Based, Multi-Objective,
               Graphs 3: Potpourri, heterogeneous entity graphs, Graph Ordering, In-Memory, scale-free graphs,
               Hardware Acceleration and Query Compilation, Co-Processor, Query Compiler, auto-scale,
               Nearest Neighbors and Similarity Search, Place Retrieval, Unstructured Text, local similarity search, Similarity Join,
    2. SIGMOD has tutorial sessions, which may be good papers for learners. There are also demo sessions. It talks about DB design, data processing, graph, stream, spatio-temopral, mining, analytics, in-memory DB, transaction processing, etc. Less NoSQL or distributed engines.

VLDB 2016

    1. topics: (no explicit category, just sampling)
               research track: query processing, event patterns, main-memory column-stores, concurrency control, stream processing,
                               graph processing, query optimization, indexing, in-memory, transactional memory, privacy, distributed join,
                               similarity search, differential privacy, distributed transaction, tensor analytics, big data, Approximate,
                               Spark, RDMA,
               industrial track: Hadoop, Spark, domain-specific languages (DSLs), in-memory, Set queries, bloom filter, RDMA,
                                 distributed in-memory DBMS, Cloud over-booking, Materialized Views, distributed machine learning,
                                 Indexing, Graph Analysis, Query Optimizer, company DB designs, In-Memory,
    2. VLDB talks a lot of in-memory, DB designs, spark, graph, etc

hotCloud 2016

    1. topics: cloud bidding, QoS, interactive debugging, software-defined, serverless, multicast in datacenter,
               coflows, KV-cache, Spark, FPGA, Unikernel, Cross-Cloud systems, public cloud Neutrality, VM Introspection,
               deduplication, container design patterns, baremetal big data, tail at scale,
    2. hotCloud, though the general reference count is low, but it does catch the cloud hot topics. worth to checkout the topics.

ACM TOCS

    1. topics: controlplane OS, network server OS, virtualization, microkernels, kernel, KV store, GPU, Multicore Architectures,
               HW/SW codesigned, SSD, reliability, scheduling, power-efficient, cache, dataplane OS, flash, big data, analytics,
               Voice Personal Asisant,
    2. ACM TOCS talks about topics in computer system designs, or OS designs

ACM TOS

    1. topics: flash, virtualization, manycores, data possession, SMR disk, transaction, persistent memory,
               garbage collecting, RAID, SSD, secure-deletion, memory-mapped IO, NAND, write skew, disk arrays,
               deduplication, sequential and temporal localities, workload, reliability, wear-leveling, predictive,
    2. ACM TOS, the paper comes from different places. It's more like an archive place. We can find a few archived good papers.

Good papers selected to read

During the tour through each top-level conferences 2016, I selected some good papers to read. Here are the reading notes